[
https://issues.apache.org/jira/browse/HADOOP-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238591#comment-14238591
]
Ben Roling commented on HADOOP-7154:
------------------------------------
Ok, so after further consideration I think my last comment/question was
probably somewhat silly. I think the problems the high vmem sizes present to
Hadoop are probably obvious to many as Todd originally suggested. I feel sort
of dumb for not realizing more quickly.
MapReduce (and YARN) monitor virtual memory sizes of task processes and kill
them when they get too big. For example, mapreduce.map.memory.mb controls the
max virtual memory size of a map task. WIthout MALLOC_ARENA_MAX this would be
broken since tasks would have super inflated vmem sizes.
[~tlipcon] - do I have that about right? Are there other types of problems you
were noticing?
Basically it seems any piece of software that tries to make decisions based on
process vmem size is going to be messed up by the glibc change and likely has
to implement MALLOC_ARENA_MAX. For some reason the fact that Hadoop was making
such decisions was escaping me when I made my last comment.
> Should set MALLOC_ARENA_MAX in hadoop-config.sh
> -----------------------------------------------
>
> Key: HADOOP-7154
> URL: https://issues.apache.org/jira/browse/HADOOP-7154
> Project: Hadoop Common
> Issue Type: Improvement
> Components: scripts
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Minor
> Fix For: 1.0.4, 0.22.0
>
> Attachments: hadoop-7154.txt
>
>
> New versions of glibc present in RHEL6 include a new arena allocator design.
> In several clusters we've seen this new allocator cause huge amounts of
> virtual memory to be used, since when multiple threads perform allocations,
> they each get their own memory arena. On a 64-bit system, these arenas are
> 64M mappings, and the maximum number of arenas is 8 times the number of
> cores. We've observed a DN process using 14GB of vmem for only 300M of
> resident set. This causes all kinds of nasty issues for obvious reasons.
> Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory
> arenas and bound the virtual memory, with no noticeable downside in
> performance - we've been recommending MALLOC_ARENA_MAX=4. We should set this
> in hadoop-env.sh to avoid this issue as RHEL6 becomes more and more common.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)