I see the point of confusion. I'm looking at the branch-0.15 version, which does not (as far as I can tell) re-instantiate the sorters each time (they're instantiated in the MapOutputBuffer constructor). You are completely correct about the trunk version.
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.15/src/java/org/apache/hadoop/mapred/MapTask.java?revision=614598&view=markup Travis ----- Original Message ---- From: Amar Kamat <[EMAIL PROTECTED]> To: [email protected] Sent: Tuesday, February 5, 2008 12:59:24 PM Subject: Re: Possible memory "leak" in MapTask$MapOutputBuffer Travis Woodruff wrote: > I think it's possible to retain even more than two copies of the keyval buffer. > > This can happen because the comparator's buffer is set only when a comparison is performed, so if no data exists for a partition in a spill cycle, the partition's comparator will retain the buffer from a previous spill. > > The sorter is the one that holds the reference to the comparator and all the sorters (for all the partitions) are re-assigned before every spill cycle. So all the earlier references will be lost. Similarly for the first collect or the collect immediately after the spill, all the sorters are re-initialized (for all the partitions). Hence I don't think there is any way the buffers could be referenced beyond single level. See HADOOP-2782 for more details. > Take this scenario: > > We have 3 partitions: A,B,C > a keyval buffer that can hold 3 keys > and an input data set that generates output keys in the following sequence: AAABBBCCC > > A > A > A > -- Spill occurs here. A's comparator has keyval buffer 1. > B's and C's are null. > B > B > B > Overflow occurs here causing sorters to be re-assigned. All the references are lost. > -- Spill occurs here. B's comparator has keyval buffer 2. A's comparator has keyval buffer 1. > A's and C's are null. > C > C > C > Overflow occurs here causing the sorters to be re-assigned. Again all the references are lost. > -- Spill occurs here. C's comparator has keyval buffer 3. B's A's comparator has keyval buffer 2. A's comparator has keyval buffer 1. > A's and B's are null here. The simple fix would be as follows : Index: src/java/org/apache/hadoop/mapred/MapTask.java =================================================================== --- src/java/org/apache/hadoop/mapred/MapTask.java (revision 618559) +++ src/java/org/apache/hadoop/mapred/MapTask.java (working copy) @@ -505,9 +505,7 @@ sortSpillException = ioe; } finally { // make sure that the collector never waits indefinitely pendingKeyvalBuffer = null; - for (int i = 0; i < partitions; i++) { - pendingSortImpl[i].close(); - } + pendingSortImpl = null; pendingKeyvalBufferLock.notify(); } } > > This is fairly unlikely, but I think it happening to me occasionally because I'm seeing OOMEs I can't explain any other way. My heap is more than large enough to support two 100M buffers. > > FYI, I added code to clear the comparator's buffer (see patch below), and a job that was failing with 650M heaps now succeeds with 512M. > > > > Travis > > > Index: src/java/org/apache/hadoop/io/WritableComparator.java > =================================================================== > --- src/java/org/apache/hadoop/io/WritableComparator.java (revision 618649) > +++ src/java/org/apache/hadoop/io/WritableComparator.java (working copy) > @@ -78,6 +78,11 @@ > } > } > > + /** Free up memory used by the internal DataInputBuffer. */ > + public void clearBuffer() { > + buffer.reset(new byte[] {}, 0, 0); > + } > + > /** Optimization hook. Override this to make SequenceFile.Sorter's scream. > * > * <p>The default implementation reads the data into two [EMAIL PROTECTED] > Index: src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java > =================================================================== > --- src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java (revision 618649) > +++ src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java (working copy) > @@ -124,6 +124,7 @@ > //release the large key-value buffer so that the GC, if necessary, > //can collect it away > keyValBuffer = null; > + comparator.clearBuffer(); > } > //A compare method that references the keyValBuffer through the indirect > //pointers > > > > > ----- Original Message ---- > From: Amar Kamat <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, February 5, 2008 12:08:48 AM > Subject: Re: Possible memory "leak" in MapTask$MapOutputBuffer > > > Hi, > Yes, > you > are > correct. > The > reference > to > the > old > keyval > buffers > are > still > there > even > after > the > buffers > are > re-initialized > but > the > reference > is > there > just > between > the > consecutive > spills. > The > scenario > before > HADOOP-1965 > was > that > the > memory > used > for > one > sort-spill > phase > is > io.sort.mb > causing > the > max > memory > usage > to > be > (2 > * > io.sort.mb). > Post > HADOOP-1965, > the > total > memory > used > for > once > sort-spill > phase > is > io.sort.mb/2, > the > max > memory > usage > is > io.sort.mb > and > the > time > duration > between > two > consecutive > spills > is > also > reduced > since > they > happen > in > parallel. > Thanks > for > pointing > it > out. > I > have > opened > HADOOP-2782 > addressing > the > same. > Amar > Travis > Woodruff > wrote: > > Well, > this > is > what > I > get > for > not > doing > my > homework > first. > >> > I > pulled > down > the > latest > code > from > trunk, > and > it > looks > like > the > updates > for > HADOOP-1965 > have > changed > this > code > significantly. > >From > what > I > can > tell, > these > changes > have > removed > the > issue; > however, > the > problem > still > exists > in > the > 0.15 > branch. > >> >> > Travis > >> > ----- > Original > Message > ---- > > From: > Travis > Woodruff > <[EMAIL PROTECTED]> > > To: > [email protected] > > Sent: > Monday, > February > 4, > 2008 > 6:41:31 > PM > > Subject: > Possible > memory > "leak" > in > MapTask$MapOutputBuffer > > > <snip for goofy formatting> > > > > ____________________________________________________________________________________ > Never miss a thing. Make Yahoo your home page. > http://www.yahoo.com/r/hs > ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
