Brian Utterback wrote:
> Assuming you could find a way to dump the array, doesn't this just give
> you a list of port whose connections are currently in CLOSE_WAIT?
> Wouldn't netstat give you the same info?
> 
> Instead of setting the array value to 1, you could set it to the value
> of walltimestamp. That way when you dumped it out, you would have the
> time it went into CLOSE_WAIT, which would give you an indication of
> which ones were in the state the longest. I wonder if you could get an
> aggregation to work here? Hmm.

In about 99 and 44/100ths percent of the cases I've looked at in the
past, what appears to be a "leak" is actually something exacerbated by
the OS.

What I usually see is that the application opens a socket (via socket()
or accept()), does some work, and then closes the socket normally.
Unbeknownst to the application, part of that "work" involved a fork(),
perhaps buried in a library somewhere.  (The free fork() given out to
users of syslog() employing LOG_CONS was once a possible cause, but
there are others.)

The fork() logic duplicates all of the open file descriptors, and the
code calling fork() in this case doesn't "know" that there are
descriptors that it shouldn't be copying so it can't easily close them
afterwards.  It's the new process -- possibly completely unknown to the
main application -- that's still holding the socket open, allowing it to
slip into CLOSE_WAIT state.

For that reason, I think any CLOSE_WAIT diagnostic function should at
least track the fork() descriptor duplication and allow you to trace
back to the application that "leaked" descriptors by way of creating new
processes.

(Would be nice to have something like z/OS's FCTLCLOFORK or the
sometimes-discussed Linux FD_DONTINHERIT flag.)

-- 
James Carlson         42.703N 71.076W         <carls...@workingcode.com>
_______________________________________________
networking-discuss mailing list
networking-discuss@opensolaris.org

Reply via email to