Ok.. so it seems like ZK and TT can be smaller than we thought.. at
least it's an option. :)
How much memory are you giving the NameNode? and the SecondaryNameNode?
It looks like those are beefy on your setup for HA purposes.. but do
they take a lot of CPU? if i ignore HA could they share a box with other
services?
Andrew Purtell wrote:
That looks good to me, in line with the best practices that are gelling as
we collectively gain operational experience.
This is how we allocate RAM on our 8GB worker nodes:
Hadoop
DataNode - 1 GB
TaskTracker - 256 MB (JVM default)
map/reduce tasks - 200 MB (Hadoop default)
HBase
ZK - 256 MB (JVM default)
Master - 1 GB (HBase default, but actual use is < 500MB)
RegionServer - 4 GB
We have a Master and hot spare Master each running on one of the
workers.
Our workers are dual quad core so we have them configured for maximum
concurrent task execution of 4 mappers and 2 reducers and we run the
TaskTracker (therefore, also the tasks) with niceness +10 to hint to
the OS the importance of scheduling the DataNodes, ZK quorum peers, or
RegionServers ahead of them.
Note that the Hadoop NameNode is a special case which runs the NN in a
standalone configuration with block device level replication to a hot
spare configured in the typical HA fashion: heartbeat monitoring,
fencing via power control operations, virtual IP address and L3 fail
over, etc.
Also, not all nodes participate in the ZK ensemble. Some 2N+1 subset is
reasonable: 3, 5, 7, or 9. I expect that a 7 or 9 node ensemble can
handle 1000s of clients, if the quorum peers are running on dedicated
hardware. We are considering this type of deployment for the future.
However, for now we colocate ZK quorum peers with (some) HBase
regionservers.
Our next generation will use 32GB. This can support aggressive caching
and in memory tables.
- Andy
________________________________
From: Fernando Padilla <[email protected]>
To: [email protected]
Sent: Friday, July 17, 2009 10:30:52 AM
Subject: Re: hbase/zookeeper
thank you!
I'll pay attention to the CPU load then. Any tips about the memory
distribution? This is what I'm expecting, but I'm a newb. :)
DataNode - 1.5G
TaskTracker - .5G
Zookeeper - .5G
RegionServer - 2G
M/R - 2G