[
https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103716#comment-16103716
]
Muhammad Samir Khan commented on TEZ-3752:
------------------------------------------
Ran orderedwordcount with
-Dtez.shuffle-vertex-manager.enable.auto-parallel=true
-Dtez.runtime.io.sort.factor=4
-Dtez.runtime.shuffle.memory-to-memory.enable=true. Sorted the output (via
sort) and diff'd against the output from orderedwordcount without the changes.
Also turned on the '"writeFile SAME_KEY count=" + count' log line in
TezMerger.writeFile to ensure we hit the RLE case with in memory merge:
2017-07-27 18:19:18,128 [INFO] [MemToMemMerger [Tokenizer]]
|orderedgrouped.MergeManager|: Tokenizer: Initiating Memory-to-Memory merge
with 4 segments of total-size: 22182024
2017-07-27 18:19:18,770 [INFO] [MemToMemMerger [Tokenizer]] |impl.TezMerger|:
writeFile SAME_KEY count=1544269
2017-07-27 18:19:18,771 [INFO] [MemToMemMerger [Tokenizer]]
|orderedgrouped.MergeManager|: Tokenizer Memory-to-Memory merge of the 4 files
in-memory complete with mergeOutputSize=22182024
> Reduce Object size of InMemoryMapOutput for large jobs
> ------------------------------------------------------
>
> Key: TEZ-3752
> URL: https://issues.apache.org/jira/browse/TEZ-3752
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Muhammad Samir Khan
> Attachments: TEZ-3752.001.patch
>
>
> Follow-on jira from TEZ-3732. The InMemoryMapOutput has a
> BoundedByteArrayOutputStream that is only used in the Merged MapOutput case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)