On 27/10/2015 09:08, casper....@oracle.com wrote:

Generally I wouldn't see that as a problem, but in the case of a socket
blocking on accept indefinitely, I do see it as a problem especially as
the thread actually wants to stop listening.

But in general, this is basically a problem with the application: the file
descriptor space is shared between threads and having one thread sniping
at open files, you do have a problem and whatever the kernel does in that
case perhaps doesn't matter all that much: the application needs to be
fixed anyway.

The scenario in Hadoop is that the FD is being used by a thread that's waiting in accept and another thread wants to shut it down, e.g. because the application is terminating and needs to stop all threads cleanly. I agree the use of shutdown()+close() on Linux or dup2() on Solaris is pretty much an application-level hack - the concern in both cases is that the file descriptor that's being used in the accept() might be recycled by another thread. However that just begs the question of why the FD isn't properly encapsulated by the application in a singleton object, with the required shut down semantics provided by having a mechanism to invalidate the singleton and its contained FD.

There are other mechanisms that could be used to do a clean shutdown that don't require the OS to provide workarounds for arguably broken application behaviour, for example by setting a 'shutdown' flag in the object and then doing a dummy connect() to the accepting FD to kick it off the accept() and thereby getting it to re-check the 'shutdown' flag and not re-enter the accept().

If the object encapsulating a FD is invalidated and that prevents the FD being used any more because the only access is via that object, then it simply doesn't matter if the FD is reused elsewhere, there can be no race so a complicated, platform-dependent dance isn't needed.

Unfortunately Hadoop isn't the only thing that pulls the shutdown() trick, so I don't think there's a simple fix for this, as discussed earlier in the thread. Having said that, if close() on Linux also did an implicit shutdown() it would mean that well-written applications that handled the scoping, sharing and reuse of FDs properly could just call close() and have it work the same way across *NIX platforms.

--
Alan Burlison
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to