Earlier key-value buffer from MapTask.java is still referenced even though its
not required anymore.
----------------------------------------------------------------------------------------------------
Key: HADOOP-2782
URL: https://issues.apache.org/jira/browse/HADOOP-2782
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: Amar Kamat
Priority: Critical
Consider the following events for a map task
Before HADOOP-1965:
|| Stage || Description || Buffers used || Memory used||
|Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect) | io.sort.mb|
|Stage-2 | KeyVal1 buffer is full and needs spilling so Sort-Spill starts |
KeyVal1 (by Sort-Spill) | io.sort.mb|
|Stage-3 | Sort-Spill finished | KeyVal1 (referenced by comparator ) |
io.sort.mb|
|Stage-4 | MapOutputBuffer starts collecting | KeyVal2(by collect) +
KeyVal1(by comparator) | 2*io.sort.mb|
|Stage-5 | KeyVal2 buffer is full and needs spilling so Sort-Spill starts |
KeyVal2 (by Sort-Spill) | io.sort.mb|
So for the time duration between Stage-4 and Stage-5 the memory used becomes
{{2 * io.sort.mb}} which can be totally avoided by removing the comparator's
reference to the earlier key-val buffer. So the maximum memory usage can be
clamped to {{io.sort.mb}}
After HADOOP-1965:
|| Stage || Description || Buffers used || Memory used ||
|Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect)| io.sort.mb/2|
|Stage-2 | KeyVal1 buffer is full and needs spilling, so Sort-Spill starts in
parallel | KeyVal1 (by Sort-Spill) | io.sort.mb/2|
|Stage-3 | MapOutputBuffer simply collects + Sort-Spill | KeyVal2(by collect)
+ KeyVal1(by Sort-Spill) | io.sort.mb|
|Stage-4 | MapOutputBuffer simply collects + Sort-Spill finishes, Sort-Impl's
are closed but the comparators still hold the reference to KeyVal1 buffer |
KeyVal2 (by collect) + KeyVal1 (referred by comparator) | io.sort.mb|
|Stage-5 | KeyVal2 buffer is full and needs spilling, so Sort-Spill starts in
parallel | KeyVal2 (by Sort-Spill) | io.sort.mb/2|
So for the time duration between Stage-4 and Stage-5 there is an unwanted
reference to the keyval buffer which prevents the GC from claiming it. However
the maximum memory usage will be {{io.sort.mb}}.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.