[
https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933962#comment-14933962
]
Siddharth Seth commented on TEZ-2850:
-------------------------------------
bq. How to we estimate the size of the segments, since it may vary for each map
output?
I mean size of the segment data structure in memory. That should be independent
of the data size. Looking at the heap images you've posted - this is
approximately 5.5K ?
3% of the memory allocated for shuffle. Comes to about 1024 segments for a
200MB allocation.
bq. Whats should be the default number of segments (should it be 0, so that 0
means ignore this setting)?
A high number. Something like 4096. 0 would disable the checks.
mapreduce.reduce.merge.inmem.threshold in hadoop corresponds to
tez.runtime.shuffle.memory-to-memory.segments - which indicates the number of
segments after which an in-mem merge will be triggered, if enabled. This is
slightly different - it's a limit on the segments, but triggers a disk merge
instead of an in-mem merge. It'll have to be consolidated with in-mem merge
once that is tested properly with Tez.
The property could be named tez.runtime.shuffle.in-memory.segments.max
> Tez MergeManager OOM for small Map Outputs
> ------------------------------------------
>
> Key: TEZ-2850
> URL: https://issues.apache.org/jira/browse/TEZ-2850
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Saikat
> Assignee: Saikat
> Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)