FWIW, I pored over this and the NIO code a bit yesterday and couldn't find 
anything obviously wrong, but NIO is a tricky beast. Is it possible that 
because the channel never gets connected, and so we never call select, the 
selector never cleans up the cancelledKeys and therefore hangs on to the fd?

-----Original Message-----
From: Patrick Hunt [mailto:[email protected]] 
Sent: Thursday, September 08, 2011 2:21 PM
To: [email protected]
Subject: Re: file descriptor leak in client code?

I don't think it's a known issue, please enter a jira. We have had
one/two of these in the past, but we've resolved them.

I would suggest aspectj. I've used this quite successfully in the past
to find networking and filesystem issues in ZooKeeper. Not sure how
easy it would be to create a unit test though (I've always verified it
manually)

Patrick

On Wed, Sep 7, 2011 at 12:00 PM, Ted Dunning <[email protected]> wrote:
> One of our engineers has built a pretty convincing manual test that
> demonstrates that the Zookeeper leaks  a few file descriptors every few
> seconds if the attempt to connect throws a network unreachable.
>
> If the max file descriptor limit is not reached, the client recovers when
> the network comes back.
>
> If the max file descriptor limit is reached, then the client never recovers
> even when the network comes back.
>
> Is this a known issue?
>
> I am building a test to demonstrate the problem and experiment across
> versions, but if somebody has broken this trail before, I would love to know
> about it.
>
> On the topic of testing this, I am also all ears if somebody has any ideas
> for how to build a nice unit test for this.  Right now something like
> mocking the network connection seems required.  That doesn't sound fun.
>

Reply via email to