[ 
https://issues.apache.org/jira/browse/HADOOP-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238591#comment-14238591
 ] 

Ben Roling commented on HADOOP-7154:
------------------------------------

Ok, so after further consideration I think my last comment/question was 
probably somewhat silly.  I think the problems the high vmem sizes present to 
Hadoop are probably obvious to many as Todd originally suggested.  I feel sort 
of dumb for not realizing more quickly.

MapReduce (and YARN) monitor virtual memory sizes of task processes and kill 
them when they get too big.  For example, mapreduce.map.memory.mb controls the 
max virtual memory size of a map task.  WIthout MALLOC_ARENA_MAX this would be 
broken since tasks would have super inflated vmem sizes.

[~tlipcon] - do I have that about right?  Are there other types of problems you 
were noticing?

Basically it seems any piece of software that tries to make decisions based on 
process vmem size is going to be messed up by the glibc change and likely has 
to implement MALLOC_ARENA_MAX.  For some reason the fact that Hadoop was making 
such decisions was escaping me when I made my last comment.

> Should set MALLOC_ARENA_MAX in hadoop-config.sh
> -----------------------------------------------
>
>                 Key: HADOOP-7154
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7154
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 1.0.4, 0.22.0
>
>         Attachments: hadoop-7154.txt
>
>
> New versions of glibc present in RHEL6 include a new arena allocator design. 
> In several clusters we've seen this new allocator cause huge amounts of 
> virtual memory to be used, since when multiple threads perform allocations, 
> they each get their own memory arena. On a 64-bit system, these arenas are 
> 64M mappings, and the maximum number of arenas is 8 times the number of 
> cores. We've observed a DN process using 14GB of vmem for only 300M of 
> resident set. This causes all kinds of nasty issues for obvious reasons.
> Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory 
> arenas and bound the virtual memory, with no noticeable downside in 
> performance - we've been recommending MALLOC_ARENA_MAX=4. We should set this 
> in hadoop-env.sh to avoid this issue as RHEL6 becomes more and more common.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to