[
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338926#comment-14338926
]
Colin Patrick McCabe commented on HDFS-7836:
--------------------------------------------
Thanks for taking a look at this, [~cnauroth] and [~arpitagarwal].
bq. Chris wrote: Do you intend to enforce an upper limit on growth of the
off-heap allocation? If so, do you see this as a new configuration property or
as a function of an existing parameter (i.e. equal to max heap with the
consideration that the block map takes ~50% of the heap now)? -Xmx alone will
no longer be sufficient to define a ceiling for RAM utilization by the NameNode
process. This can be important in deployments that choose to co-locate other
Hadoop daemons on the same hosts as the NameNode.
That's an interesting question. It is certainly possible to put an upper limit
on the growth of the total process heap (Java + non-Java) by using {{ulimit
\-m}}, on any UNIX-like OS. I'm not sure that I would recommend this
configuration, since the behavior for when the ulimit is exceeded is that the
process is terminated. I think that it would be better in most cases to have
the management software running on the cluster examine heap sizes, and warn
when memory is getting low.
bq. Can I take this to mean that there will be no new native code written as
part of this project? Of course, we can always do a native code implementation
later if use of a private Sun API becomes problematic, but I wanted to
understand the code footprint for the current proposal. Avoiding native code
entirely would be nice, because it reduces the scope of testing efforts across
multiple platforms.
Yeah, we are hoping to avoid writing any JNI code here. So far, it looks good
on that front.
bq. Are you proposing that off-heaping is an opt-in feature that must be
explicitly enabled in configuration, or are you proposing that off-heaping will
be the new default behavior? Arguably, jumping to off-heaping as the default
could be seen as a backwards-incompatibility, because it might be unsafe to
deploy the feature without simultaneous down-tuning the NameNode max heap size.
Some might see that as backwards-incompatible with existing configurations.
I think off-heaping should be on by default. HDFS gets enough bad press from
having short-circuit and other optimizations turned off by default... we should
be a little nicer this time :)
With regard to compatibility... even if the NN max heap size is unchanged, the
NN will not actually consume that memory until it needs it, right? So existing
configuration should work just fine. I guess the one exception might be if you
set a super-huge *minimum* (not maximum) JVM heap size. But even in that case,
I would expect things to work, since virtual memory would pick up the slack.
Unless you have turned off swapping, but that's not recommended. As a
practical matter, we have found that everyone who has a big NN heap has a big
NN machine, which has much more memory than we can even use currently. So I
would not expect this to be a problem in practice.
bq. Arpit asked: Do you have any estimates for startup time overhead due to GCs?
There are a few big users who can shave several minutes off of their NN startup
time by setting the NN minimum heap size to something large. This prevents
successive rounds of stop the world GC where we copy everything to a new, 2x as
large heap. We will get some before and after startup numbers later that
should illustrate this even more clearly.
> BlockManager Scalability Improvements
> -------------------------------------
>
> Key: HDFS-7836
> URL: https://issues.apache.org/jira/browse/HDFS-7836
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Charles Lamb
> Assignee: Charles Lamb
> Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)