Hi Marcelo,
> I'm not really familiar with how multi-node HA was implemented (I > stopped at session recovery), but why isn't a single server doing the > update and storing the results in ZK? Unless it's actually doing > load-balancing, it seems like that would avoid multiple servers having > to hit YARN. > We considered having one server update ZooKeeper, but the extra benefits that we would get from polling yarn fewer times is not worth the extra complexity needed to implement it. For example, we would have to make servers aware of each other, and aware of each others failures. We would've needed a voting mechanism to select a new leader to update ZooKeeper each time the current leader had a failure. Also rolling out updates would be tricker with servers that are aware of each other.
