Specifically, for dealing with a large number of clients, you can use ZooKeeper Observers.
---------- Forwarded message ---------- From: Eric Newton <[email protected]> Date: Fri, Mar 14, 2014 at 3:18 PM Subject: HA namenode questions To: [email protected] For those of you running HA NN on large clusters, I'm looking for some advice. I was looking at an HA NN config today. Either by default, or by following the configuration instructions, I saw that the zookeeper timeout was set to 5 seconds. * is this a reasonable timeout? * do you provide HA NN its own set of zookeepers? We have seen problems with large GC pauses with tablet servers. This happens less and less as we have learned more tricks, but I'm constantly talking to users who want their zookeeper timeout as high as two minutes. We have also had to increase the number of zookeepers on our largest clusters in order to handle the "thundering herd" load when large map/reduce jobs kick off and they all start talking to accumulo, which requires reading information from zookeeper. Any experience you can share about HA NN configuration at scales over few hundred nodes would be appreciated. -Eric
