[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338926#comment-14338926
 ] 

Colin Patrick McCabe commented on HDFS-7836:
--------------------------------------------

Thanks for taking a look at this, [~cnauroth] and [~arpitagarwal].

bq. Chris wrote: Do you intend to enforce an upper limit on growth of the 
off-heap allocation? If so, do you see this as a new configuration property or 
as a function of an existing parameter (i.e. equal to max heap with the 
consideration that the block map takes ~50% of the heap now)? -Xmx alone will 
no longer be sufficient to define a ceiling for RAM utilization by the NameNode 
process. This can be important in deployments that choose to co-locate other 
Hadoop daemons on the same hosts as the NameNode.

That's an interesting question.  It is certainly possible to put an upper limit 
on the growth of the total process heap (Java + non-Java) by using {{ulimit 
\-m}}, on any UNIX-like OS.  I'm not sure that I would recommend this 
configuration, since the behavior for when the ulimit is exceeded is that the 
process is terminated.  I think that it would be better in most cases to have 
the management software running on the cluster examine heap sizes, and warn 
when memory is getting low.

bq. Can I take this to mean that there will be no new native code written as 
part of this project? Of course, we can always do a native code implementation 
later if use of a private Sun API becomes problematic, but I wanted to 
understand the code footprint for the current proposal. Avoiding native code 
entirely would be nice, because it reduces the scope of testing efforts across 
multiple platforms.

Yeah, we are hoping to avoid writing any JNI code here.  So far, it looks good 
on that front.

bq. Are you proposing that off-heaping is an opt-in feature that must be 
explicitly enabled in configuration, or are you proposing that off-heaping will 
be the new default behavior? Arguably, jumping to off-heaping as the default 
could be seen as a backwards-incompatibility, because it might be unsafe to 
deploy the feature without simultaneous down-tuning the NameNode max heap size. 
Some might see that as backwards-incompatible with existing configurations.

I think off-heaping should be on by default.  HDFS gets enough bad press from 
having short-circuit and other optimizations turned off by default... we should 
be a little nicer this time :)

With regard to compatibility... even if the NN max heap size is unchanged, the 
NN will not actually consume that memory until it needs it, right?  So existing 
configuration should work just fine.  I guess the one exception might be if you 
set a super-huge *minimum* (not maximum) JVM heap size.  But even in that case, 
I would expect things to work, since virtual memory would pick up the slack.  
Unless you have turned off swapping, but that's not recommended.  As a 
practical matter, we have found that everyone who has a big NN heap has a big 
NN machine, which has much more memory than we can even use currently.  So I 
would not expect this to be a problem in practice.

bq. Arpit asked: Do you have any estimates for startup time overhead due to GCs?

There are a few big users who can shave several minutes off of their NN startup 
time by setting the NN minimum heap size to something large.  This prevents 
successive rounds of stop the world GC where we copy everything to a new, 2x as 
large heap.  We will get some before and after startup numbers later that 
should illustrate this even more clearly.

> BlockManager Scalability Improvements
> -------------------------------------
>
>                 Key: HDFS-7836
>                 URL: https://issues.apache.org/jira/browse/HDFS-7836
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to