ZK guys seem to say you should give it 1GB at least.
This should not matter for 0.20. In 0.21, our use of ZK will expand and
it will need more memory. If you plan on using ZK for anything besides
HBase, make sure you give it more memory. For now, you're probably okay
with 256MB.
Andrew Purtell wrote:
That looks good to me, in line with the best practices that are gelling as
we collectively gain operational experience.
This is how we allocate RAM on our 8GB worker nodes:
Hadoop
DataNode - 1 GB
TaskTracker - 256 MB (JVM default)
map/reduce tasks - 200 MB (Hadoop default)
HBase
ZK - 256 MB (JVM default)
Master - 1 GB (HBase default, but actual use is < 500MB)
RegionServer - 4 GB
We have a Master and hot spare Master each running on one of the
workers.
Our workers are dual quad core so we have them configured for maximum
concurrent task execution of 4 mappers and 2 reducers and we run the
TaskTracker (therefore, also the tasks) with niceness +10 to hint to
the OS the importance of scheduling the DataNodes, ZK quorum peers, or
RegionServers ahead of them.
Note that the Hadoop NameNode is a special case which runs the NN in a
standalone configuration with block device level replication to a hot
spare configured in the typical HA fashion: heartbeat monitoring,
fencing via power control operations, virtual IP address and L3 fail
over, etc.
Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
handle 1000s of clients, if the quorum peers are running on dedicated
hardware. We are considering this type of deployment for the future.
However, for now we colocate ZK quorum peers with (some) HBase
regionservers.
Our next generation will use 32GB. This can support aggressive caching
and in memory tables.
- Andy
________________________________
From: Fernando Padilla <[email protected]>
To: [email protected]
Sent: Friday, July 17, 2009 10:30:52 AM
Subject: Re: hbase/zookeeper
thank you!
I'll pay attention to the CPU load then. Any tips about the memory
distribution? This is what I'm expecting, but I'm a newb. :)
DataNode - 1.5G
TaskTracker - .5G
Zookeeper - .5G
RegionServer - 2G
M/R - 2G