Jay: It's unnecessary to ensure a client maintains a ZK connection. A heartbeat mechanism is baked into the ZK session semantics. In other words, there's no such thing as disconnecting from ZK due to inactivity since, in many coordination algorithms, liveness (i.e. mere presence) is required for correct functionality. You can prove this to yourself by reading through http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions
...although the following paragraph is what you're looking for: "The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server." Specifically, this bug is real, but not caused by idle disconnects. It would be an error to attempt to "manage" the ZK session. You're not even supposed to handle reconnects yourself with ZK (because of the herd effect); ZK handles this by internally managing retries and then, upon successfully reestablishing the connection, deciding if you are expired. On Mon, May 7, 2012 at 3:03 PM, Jay Stricks <[email protected]> wrote: > I'm wondering how people ensure that their masters stay connected to the > ZooKeeper server during long periods of time when no config changes are > made. I'm referring specifically to the issues raised in FLUME-60 ( > https://issues.apache.org/jira/browse/FLUME-60): > > This seems related to long pauses or breakpoints. Disconnecting from ZK is > probably reasonable in these conditions, but ideally the connection should > be recovered. > > As an example, after a long pause, a command that modifies ZK state has > this error message: > > Not connected to ZooKeeper: CLOSED > > > I'm trying to think of possible solutions that don't require restarting > the master. One idea is to have a test agent periodically issue > configuration statements to each master, but are there any other ideas out > there? > > Thanks, > > Jay > -- Eric Sammer twitter: esammer data: www.cloudera.com
