The Zookeeper devs suggest giving 1 GB heap to each process. I run it
with default heap (256 MB) and it's stable for me, but I run relatively
small clusters.


ZK wants its own disk for the transaction log. So if you can, dedicate a
disk, or run ZK on separate servers. 

Our EC2 scripts start a separate ZK quorum ensemble. 

It's really better to run ZK on separate servers if you can spare them.
This decouples ZK from any HBase or HDFS loading. ZK is especially
sensitive to latencies introduced by CPU or I/O contention. 

ZN is a 2N+1 fault tolerant system. Run 3 to tolerate the loss of 1 
instance. Run 5 to tolerate the loss of 2. Etc. Based on literature I've
seen there are diminishing returns after an ensemble size of about 9.
Increase the number of instances in the ensemble on roughly a log scale
as your cluster size increases, e.g. use 3 for cluster of 4-50 servers,
5 for 50-1000, 7 for 1000+, 9 for 10000+. There's no hard rule there. 
Monitoring for average read and write latency and adjustments to quorum
size as needed is recommended.

Hope that helps, 

   - Andy


----- Original Message ----
> From: Michał Podsiadłowski <[email protected]>
> To: [email protected]
> Sent: Tue, February 9, 2010 8:10:50 AM
> Subject: Zookeeper - usage and load
> 
> Hi all!
> 
> Can someone drop me few words about how exactly hbase utilizes currently
> zookeeper? What kind of load it takes during intensive load on hbase? What
> heap space it needs to operate correctly and how much disk space? How many
> instances are needed if we have only 3 region server and one HMaster? Since
> there is only one there isn't much to elect in case of it's failure. Are
> there any other operations apart from master election/lookup?
> I was trying to google it but there isn't much i can find except for few
> jira issue.
> 
> Thanks,
> Michal





Reply via email to