If you cannot tolerate this sort of situation, then the only solution is typically to avoid sequential ephemerals. The problem is that in the presence of a flaky network you cannot always tell if a failed create actually created the znode in question. This is because the network may have failed after the create succeeded, but before you got the result. In that case, since this is a sequential ephemeral, you can't know if your file got created because you don't even know the name. Moreover, scanning doesn't help because if you could scan, you probably could have used a fixed unique name in the first place.
There is a very long standing proposed (nearly complete) solution for this that requires some difficult coding. See https://issues.apache.org/jira/browse/ZOOKEEPER-22 2011/9/21 Fournier, Camille F. <[email protected]> > This is expected. In cases where the network becomes unstable, it is the > responsibility of the client writer to handle disconnected events > appropriately and check to verify whether nodes they tried to write around > the time of these events did or did not succeed. It makes writing a > "Generic" client for ZK very difficult (search the mailing list for zkclient > and you'll read a bunch of convos around this topic). Fortunately, many > things that rely on EPHEMERAL_SEQUENTIAL nodes can tolerate some duplication > of data, so often it's not a huge problem. > > C > > -----Original Message----- > From: 박영근(Alex) [mailto:[email protected]] > Sent: Wednesday, September 21, 2011 9:16 AM > To: [email protected] > Cc: [email protected] > Subject: Creating a znode with SEQUENTIAL_EPHEMERAL mode becomes corrupt in > case of unstable network > > Hi, All > > I met a problem in creating a znode with SEQUENTIAL_EPHEMERAL mode under > unstable network condition. > > While a client did not receive a message that a sequential node was > created, > the ensemble has the znode, which is checked at zookeeper dashboard( > https://github.com/phunt/zookeeper_dashboard). > > If the client receives a DISCONNECTED event, it tries to reconnect. > Session timeout is 30 seconds. > > Unstable network condition is made as the following: > > The grinder agent sends a request of creating a znode of > CreateMode. SEQUENTIAL_EPHEMERAL. > ZK ensemble has three servers. > Each NIC of server is down and up repeatedly; > NIC of server1 become down every one minute and sleeps for 9 seconds, then > up > NIC of server2 become down every 2 minute and sleeps for 9 seconds, then up > NIC of server3 become down every 3 minute and sleeps for 9 seconds, then up > > Is there any idea or related issue? > > Thanks in advance. > > Alex >
