[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338926#comment-14338926 ]
Colin Patrick McCabe commented on HDFS-7836: -------------------------------------------- Thanks for taking a look at this, [~cnauroth] and [~arpitagarwal]. bq. Chris wrote: Do you intend to enforce an upper limit on growth of the off-heap allocation? If so, do you see this as a new configuration property or as a function of an existing parameter (i.e. equal to max heap with the consideration that the block map takes ~50% of the heap now)? -Xmx alone will no longer be sufficient to define a ceiling for RAM utilization by the NameNode process. This can be important in deployments that choose to co-locate other Hadoop daemons on the same hosts as the NameNode. That's an interesting question. It is certainly possible to put an upper limit on the growth of the total process heap (Java + non-Java) by using {{ulimit \-m}}, on any UNIX-like OS. I'm not sure that I would recommend this configuration, since the behavior for when the ulimit is exceeded is that the process is terminated. I think that it would be better in most cases to have the management software running on the cluster examine heap sizes, and warn when memory is getting low. bq. Can I take this to mean that there will be no new native code written as part of this project? Of course, we can always do a native code implementation later if use of a private Sun API becomes problematic, but I wanted to understand the code footprint for the current proposal. Avoiding native code entirely would be nice, because it reduces the scope of testing efforts across multiple platforms. Yeah, we are hoping to avoid writing any JNI code here. So far, it looks good on that front. bq. Are you proposing that off-heaping is an opt-in feature that must be explicitly enabled in configuration, or are you proposing that off-heaping will be the new default behavior? Arguably, jumping to off-heaping as the default could be seen as a backwards-incompatibility, because it might be unsafe to deploy the feature without simultaneous down-tuning the NameNode max heap size. Some might see that as backwards-incompatible with existing configurations. I think off-heaping should be on by default. HDFS gets enough bad press from having short-circuit and other optimizations turned off by default... we should be a little nicer this time :) With regard to compatibility... even if the NN max heap size is unchanged, the NN will not actually consume that memory until it needs it, right? So existing configuration should work just fine. I guess the one exception might be if you set a super-huge *minimum* (not maximum) JVM heap size. But even in that case, I would expect things to work, since virtual memory would pick up the slack. Unless you have turned off swapping, but that's not recommended. As a practical matter, we have found that everyone who has a big NN heap has a big NN machine, which has much more memory than we can even use currently. So I would not expect this to be a problem in practice. bq. Arpit asked: Do you have any estimates for startup time overhead due to GCs? There are a few big users who can shave several minutes off of their NN startup time by setting the NN minimum heap size to something large. This prevents successive rounds of stop the world GC where we copy everything to a new, 2x as large heap. We will get some before and after startup numbers later that should illustrate this even more clearly. > BlockManager Scalability Improvements > ------------------------------------- > > Key: HDFS-7836 > URL: https://issues.apache.org/jira/browse/HDFS-7836 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Charles Lamb > Assignee: Charles Lamb > Attachments: BlockManagerScalabilityImprovementsDesign.pdf > > > Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)