[ 
https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103716#comment-16103716
 ] 

Muhammad Samir Khan commented on TEZ-3752:
------------------------------------------

Ran orderedwordcount with 
-Dtez.shuffle-vertex-manager.enable.auto-parallel=true 
-Dtez.runtime.io.sort.factor=4 
-Dtez.runtime.shuffle.memory-to-memory.enable=true. Sorted the output (via 
sort) and diff'd against the output from orderedwordcount without the changes.

Also turned on the '"writeFile SAME_KEY count=" + count' log line in 
TezMerger.writeFile to ensure we hit the RLE case with in memory merge:
2017-07-27 18:19:18,128 [INFO] [MemToMemMerger [Tokenizer]] 
|orderedgrouped.MergeManager|: Tokenizer: Initiating Memory-to-Memory merge 
with 4 segments of total-size: 22182024
2017-07-27 18:19:18,770 [INFO] [MemToMemMerger [Tokenizer]] |impl.TezMerger|: 
writeFile SAME_KEY count=1544269
2017-07-27 18:19:18,771 [INFO] [MemToMemMerger [Tokenizer]] 
|orderedgrouped.MergeManager|: Tokenizer Memory-to-Memory merge of the 4 files 
in-memory complete with mergeOutputSize=22182024

> Reduce Object size of InMemoryMapOutput for large jobs
> ------------------------------------------------------
>
>                 Key: TEZ-3752
>                 URL: https://issues.apache.org/jira/browse/TEZ-3752
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Muhammad Samir Khan
>         Attachments: TEZ-3752.001.patch
>
>
> Follow-on jira from TEZ-3732. The InMemoryMapOutput has a 
> BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to