We are planning to run Zookeeper nodes embedded with the client nodes. I.e., each client runs also a ZK node. So, network partition will disconnect a ZK node and not only the client. My concern is about the following statement from the ZK documentation:
"Timeliness: The clients view of the system is guaranteed to be up-to-date within a certain time bound. (*On the order of tens of seconds.*) Either system changes will be seen by a client within this bound, or the client will detect a service outage." What are these "*tens of seconds*"? Can we reduce this time by configuring "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong guarantee on this time bound? On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <jor...@jordanzimmerman.com> wrote: > > Old service leader will detect network partition max 15 seconds after it > > happened. > > If the old service leader is in a very long GC it will not detect the > partition. In the face of VM pauses, etc. it's not possible to avoid 2 > leaders for a short period of time. > > -JZ