[ 
https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933962#comment-14933962
 ] 

Siddharth Seth commented on TEZ-2850:
-------------------------------------

bq. How to we estimate the size of the segments, since it may vary for each map 
output?
I mean size of the segment data structure in memory. That should be independent 
of the data size. Looking at the heap images you've posted - this is 
approximately 5.5K ?
3% of the memory allocated for shuffle. Comes to about 1024 segments for a 
200MB allocation.

bq. Whats should be the default number of segments (should it be 0, so that 0 
means ignore this setting)?
A high number. Something like 4096. 0 would disable the checks.


mapreduce.reduce.merge.inmem.threshold in hadoop corresponds to 
tez.runtime.shuffle.memory-to-memory.segments - which indicates the number of 
segments after which an in-mem merge will be triggered, if enabled. This is 
slightly different - it's a limit on the segments, but triggers a disk merge 
instead of an in-mem merge. It'll have to be consolidated with in-mem merge 
once that is tested properly with Tez.
The property could be named tez.runtime.shuffle.in-memory.segments.max

> Tez MergeManager OOM for small Map Outputs
> ------------------------------------------
>
>                 Key: TEZ-2850
>                 URL: https://issues.apache.org/jira/browse/TEZ-2850
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Saikat
>            Assignee: Saikat
>         Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to