[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-2872.
-----------------------------------------
    Resolution: Won't Fix

Closing this as won't fix, given development on hadoop-1 has effectively 
stopped.

> Optimize TaskTracker memory usage
> ---------------------------------
>
>                 Key: MAPREDUCE-2872
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>              Labels: memory, optimization
>
> We observe high memory usage of framework level components on slave node, 
> mainly TaskTracker & Child, especially for large clusters. To be clear at 
> first, large jobs with 10000-100000 map and >10000 reduce tasks are very 
> common in our offline cluster, and will very likely continue to grow. This is 
> reasonable because the number of map & reduce slots are in the same range, 
> and it's impractical for users to reduce their job's task number without 
> execution time penalty. 
> High memory consumption will:
> * Limit the memory used by up level application; 
> * Reduce page cache space, which plays a  important role in spill, merge, 
> shuffle and even HDFS performance; 
> * Increase the probability of slave node OOM, which may affect storage 
> layer(HDFS) too. 
> A stable TT with predictable memory behavior is desired, this also applies to 
> Child JVM.
> This issue focuses on TaskTracker memory optimization, on our cluster, 
> TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory & 
> much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent 
> OOM, then the TT memory is in 200M-1200M range, and 800M in average. 
> Here are some ideas:  
> Jetty http connection use a lot memory when these are many requests in queue, 
> we need to limit the length of the queue, combine multiple requests into one 
> request, or use netty just like MR2
> TaskCompletionEvents use a lot memory too if a job have large number of map 
> task, this won't be a problem in MR2, but can be optimized, A typical 
> TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will 
> use about 30M memory, problem will appear if there are some big RunningJob in 
> a TaskTracker. There are more memory efficient implementations for 
> TaskCompletionEvent.
> IndexCache: memory of indexcache varies directly as reduce number, on large 
> cluster 10MB of indexcache is not enough, 
> we set it to 100MB, again use primitive long[] instead of IndexRecord[] can 
> save 50% of memory.
> Although some of the above won't be a problem in MR-v2, since MR-v1 is still 
> widely used, I think optimizations are needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to