I got a bug report for Curator today and while debugging it I found out that if
the current thread is interrupted, ZK node creation will succeed on the server,
but the create() method will throw an exception. The InterruptedException gets
thrown by ClientCnxn.submitRequest() when packet.wait() is called. This makes
the lock recipe even trickier. If an app uses a single ZK handle in a
multi-threaded app and is shutting down one of the threads, the thread could
potentially start a lock recipe before it realizes the thread is interrupted.
That thread will try to create an EPHEMERAL_SEQUENTIAL node which will succeed
on the server, but the thread will throw an exception. The client has no way of
knowing what it's node is and, because it's an interrupted exception, might not
even know to delete it.
Curator handles KeeperExceptions in its lock recipe by inserting a GUID in the
node name and calling getChildren when a KeeperException is caught during
EPHEMERAL_SEQUENTIAL node create. Of course, InterruptedException is not in the
category of recoverable exception. I'd appreciate this ml's thoughts on this.
Curator could handle this by checking if the current thread is interrupted
before attempting to create the EPHEMERAL_SEQUENTIAL node.
Here's sample code, FYI:
String connectString = server.getConnectString();
final CountDownLatch latch = new CountDownLatch(1);
Watcher watcher = new Watcher()
{
@Override
public void process(WatchedEvent event)
{
if ( event.getState() == Event.KeeperState.SyncConnected )
{
latch.countDown();
}
}
};
ZooKeeper zk = new ZooKeeper(connectString, 10000, watcher);
latch.await();
try
{
zk.create("/test", new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
Thread.currentThread().interrupt();
zk.create("/test/x-", new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL);
}
catch ( InterruptedException e )
{
Thread.interrupted(); // clear
}
System.out.println(zk.getChildren("/test", false));
zk.close();