FWIW, I pored over this and the NIO code a bit yesterday and couldn't find anything obviously wrong, but NIO is a tricky beast. Is it possible that because the channel never gets connected, and so we never call select, the selector never cleans up the cancelledKeys and therefore hangs on to the fd?
-----Original Message----- From: Patrick Hunt [mailto:[email protected]] Sent: Thursday, September 08, 2011 2:21 PM To: [email protected] Subject: Re: file descriptor leak in client code? I don't think it's a known issue, please enter a jira. We have had one/two of these in the past, but we've resolved them. I would suggest aspectj. I've used this quite successfully in the past to find networking and filesystem issues in ZooKeeper. Not sure how easy it would be to create a unit test though (I've always verified it manually) Patrick On Wed, Sep 7, 2011 at 12:00 PM, Ted Dunning <[email protected]> wrote: > One of our engineers has built a pretty convincing manual test that > demonstrates that the Zookeeper leaks a few file descriptors every few > seconds if the attempt to connect throws a network unreachable. > > If the max file descriptor limit is not reached, the client recovers when > the network comes back. > > If the max file descriptor limit is reached, then the client never recovers > even when the network comes back. > > Is this a known issue? > > I am building a test to demonstrate the problem and experiment across > versions, but if somebody has broken this trail before, I would love to know > about it. > > On the topic of testing this, I am also all ears if somebody has any ideas > for how to build a nice unit test for this. Right now something like > mocking the network connection seems required. That doesn't sound fun. >
