Rajesh Balamohan created TEZ-4211:
-------------------------------------

             Summary: Optimise MergeManager final merge
                 Key: TEZ-4211
                 URL: https://issues.apache.org/jira/browse/TEZ-4211
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Rajesh Balamohan


There are cases, when entire data is held in memory and no disk segments are 
present in MergeManager. Currently, mergemanager spills mem segments to disk 
before proceeding.

 

[https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/MergeManager.java#L1184]

 
{code:java}
if (numMemDiskSegments > 0 && ioSortFactor > onDiskMapOutputs.size()) {
...
..
TezMerger.writeFile(rIter, writer, progressable, 
TezRuntimeConfiguration.TEZ_RUNTIME_RECORDS_BEFORE_PROGRESS_DEFAULT);
...
..
 {code}

This can be optimised not to spill to disk when only mem segments are present.

Snippet from logs in one of the apps (Q78)

{noformat}
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=839646500 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=859378362 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=856145179 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=849878734 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=842666749 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=839533127 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=860448335 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=844468505 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=850099810 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=849206236 and #onDiskOutputs=0, 
size=0
 [ShuffleAndMergeRunner {Map_1} ()] 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager: 
finalMerge with #inMemoryOutputs=4112, size=840238680 and #onDiskOutputs=0, 
size=0
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to