Optimize TaskTracker memory usage
---------------------------------
Key: MAPREDUCE-2872
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tasktracker
Affects Versions: 0.20.203.0
Reporter: Binglin Chang
Assignee: Binglin Chang
We observe high memory usage of framework level components on slave node,
mainly TaskTracker & Child, especially for large clusters. To be clear at
first, large jobs with 10000-100000 map and >10000 reduce tasks are very common
in our offline cluster, and will very likely continue to grow. This is
reasonable because the number of map & reduce slots are in the same range, and
it's impractical for users to reduce their job's task number without execution
time penalty.
High memory consumption will:
* Limit the memory used by up level application;
* Reduce page cache space, which plays a important role in spill, merge,
shuffle and even HDFS performance;
* Increase the probability of slave node OOM, which may affect storage
layer(HDFS) too.
A stable TT with predictable memory behavior is desired, this also applies to
Child JVM.
This issue focuses on TaskTracker memory optimization, on our cluster,
TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory &
much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent
OOM, then the TT memory is in 200M-1200M range, and 800M in average.
Here are some ideas:
Jetty http connection use a lot memory when these are many requests in queue,
we need to limit the length of the queue, combine multiple requests into one
request, or use netty just like MR2
TaskCompletionEvents use a lot memory too if a job have large number of map
task, this won't be a problem in MR2, but can be optimized, A typical
TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will use
about 30M memory, problem will appear if there are some big RunningJob in a
TaskTracker. There are more memory efficient implementations for
TaskCompletionEvent.
IndexCache: memory of indexcache varies directly as reduce number, on large
cluster 10MB of indexcache is not enough,
we set it to 100MB, again use primitive long[] instead of IndexRecord[] can
save 50% of memory.
Although some of the above won't be a problem in MR-v2, since MR-v1 is still
widely used, I think optimizations are needed.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira