One of our engineers has built a pretty convincing manual test that demonstrates that the Zookeeper leaks a few file descriptors every few seconds if the attempt to connect throws a network unreachable.
If the max file descriptor limit is not reached, the client recovers when the network comes back. If the max file descriptor limit is reached, then the client never recovers even when the network comes back. Is this a known issue? I am building a test to demonstrate the problem and experiment across versions, but if somebody has broken this trail before, I would love to know about it. On the topic of testing this, I am also all ears if somebody has any ideas for how to build a nice unit test for this. Right now something like mocking the network connection seems required. That doesn't sound fun.
