Brian Utterback wrote: > Assuming you could find a way to dump the array, doesn't this just give > you a list of port whose connections are currently in CLOSE_WAIT? > Wouldn't netstat give you the same info? > > Instead of setting the array value to 1, you could set it to the value > of walltimestamp. That way when you dumped it out, you would have the > time it went into CLOSE_WAIT, which would give you an indication of > which ones were in the state the longest. I wonder if you could get an > aggregation to work here? Hmm.
In about 99 and 44/100ths percent of the cases I've looked at in the past, what appears to be a "leak" is actually something exacerbated by the OS. What I usually see is that the application opens a socket (via socket() or accept()), does some work, and then closes the socket normally. Unbeknownst to the application, part of that "work" involved a fork(), perhaps buried in a library somewhere. (The free fork() given out to users of syslog() employing LOG_CONS was once a possible cause, but there are others.) The fork() logic duplicates all of the open file descriptors, and the code calling fork() in this case doesn't "know" that there are descriptors that it shouldn't be copying so it can't easily close them afterwards. It's the new process -- possibly completely unknown to the main application -- that's still holding the socket open, allowing it to slip into CLOSE_WAIT state. For that reason, I think any CLOSE_WAIT diagnostic function should at least track the fork() descriptor duplication and allow you to trace back to the application that "leaked" descriptors by way of creating new processes. (Would be nice to have something like z/OS's FCTLCLOFORK or the sometimes-discussed Linux FD_DONTINHERIT flag.) -- James Carlson 42.703N 71.076W <carls...@workingcode.com> _______________________________________________ networking-discuss mailing list networking-discuss@opensolaris.org