Hi,
Yes, you are correct. The reference to the old keyval buffers are still
there even after the buffers are re-initialized but the reference is
there just between the consecutive spills. The scenario before
HADOOP-1965 was that the memory used for one sort-spill phase is
io.sort.mb causing the max memory usage to be (2 * io.sort.mb). Post
HADOOP-1965, the total memory used for once sort-spill phase is
io.sort.mb/2, the max memory usage is io.sort.mb and the time duration
between two consecutive spills is also reduced since they happen in
parallel. Thanks for pointing it out. I have opened HADOOP-2782
addressing the same.
Amar
Travis Woodruff wrote:
Well, this is what I get for not doing my homework first.
I pulled down the latest code from trunk, and it looks like the updates for
HADOOP-1965 have changed this code significantly. From what I can tell, these
changes have removed the issue; however, the problem still exists in the 0.15
branch.
Travis
----- Original Message ----
From: Travis Woodruff <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, February 4, 2008 6:41:31 PM
Subject: Possible memory "leak" in MapTask$MapOutputBuffer
I
have
been
using
Hadoop
for
a
couple
of
months
now,
and
I
recently
moved
to
an
x86_64
platform.
When
I
ran
some
jobs
that
I've
run
previously
on
the
32-bit
cluster,
I
got
OutOfMemoryError
on
a
large
number
of
map
tasks.
I
initially
chalked
it
up
to
64-bit
object
overhead
being
a
bit
higher
and
increased
my
task
process
heap
size
from
512M
to
650M.
After
increasing
it,
the
OOMEs
have
decreased,
but
I'm
still
seeing
them
occasionally,
so
I
did
some
poking
around
in
a
heap
snapshot,
and
I
think
I've
found
a
potential
problem
with
the
way
the
sort
buffer
is
being
cleaned
up.
After
MapOutputBuffer
calls,
sortAndSpillToDisk(),
it
iterates
over
all
the
sortImpls,
and
calls
close().
This
close
nulls
the
keyValBuffer
member
of
BasicTypeSorterBase;
however,
it
does
not
clear
the
references
in
the
sorter's
comparator
(WritableComparator.buffer).
Because
of
this,
I
think
it's
possible
for
the
old
buffer
(or
even
multiple
old
buffers)
to
not
be
GC'd.
If
one
or
more
partiitions'
sorters
are
used
for
sorting
a
buffer's
contents
but
not
for
the
next,
the
comparators
for
the
sorters
for
the
first
set
of
partitions
will
hold
a
reference
to
the
first
buffer
even
after
the
new
buffer
is
created.
Please
let
me
know
if
you
agree
with
this
assessment.
If
this
is
indeed
a
problem
it
could
(at
least
partially)
explain
some
of
the
mysterious
memory
usage
discussed
in
HADOOP-2751.
Thanks,
Travis
____________________________________________________________________________________
Looking
for
last
minute
shopping
deals?
Find
them
fast
with
Yahoo!
Search.
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ