[ 
https://issues.apache.org/jira/browse/HADOOP-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HADOOP-12487:
------------------------------------------
    Summary: DomainSocket.close() assumes Linux behaviour  (was: 
DomainSocket.close() assumes incorrect Linux behaviour)

> DomainSocket.close() assumes Linux behaviour
> --------------------------------------------
>
>                 Key: HADOOP-12487
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12487
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: net
>    Affects Versions: 2.7.1
>         Environment: Linux Solaris
>            Reporter: Alan Burlison
>            Assignee: Alan Burlison
>         Attachments: shutdown.c
>
>
> I'm getting a test failure in TestDomainSocket.java, in the 
> testSocketAcceptAndClose test. That test creates a socket which one thread 
> waits on in DomainSocket.accept() whilst a second thread sleeps for a short 
> time before closing the same socket with DomainSocket.close().
> DomainSocket.close() first calls shutdown0() on the socket before closing 
> close0() - both those are thin wrappers around the corresponding libc socket 
> calls. DomainSocket.close() contains the following comment, explaining the 
> logic involved:
> {code}
>           // Calling shutdown on the socket will interrupt blocking system
>           // calls like accept, write, and read that are going on in a
>           // different thread.
> {code}
> Unfortunately that relies on non-standards-compliant Linux behaviour. I've 
> written a simple C test case that replicates the scenario above:
> # ThreadA opens, binds, listens and accepts on a socket, waiting for 
> connections.
> # Some time later ThreadB calls shutdown on the socket ThreadA is waiting in 
> accept on.
> Here is what happens:
> On Linux, the shutdown call in ThreadB succeeds and the accept call in 
> ThreadA returns with EINVAL.
> On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA 
> continues to wait in accept.
> Relevant POSIX manpages:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html
> The POSIX shutdown manpage says:
> "The shutdown() function shall cause all or part of a full-duplex connection 
> on the socket associated with the file descriptor socket to be shut down."
> ...
> "\[ENOTCONN] The socket is not connected."
> Page 229 & 303 of "UNIX System V Network Programming" say:
> "shutdown can only be called on sockets that have been previously connected"
> "The socket \[passed to accept that] fd refers to does not participate in the 
> connection. It remains available to receive further connect indications"
> That is pretty clear, sockets being waited on with accept are not connected 
> by definition. Nor is it the accept socket connected when a client connects 
> to it, it is the socket returned by accept that is connected to the client. 
> Therefore the Solaris behaviour of failing the shutdown call is correct.
> In order to get the required behaviour of ThreadB causing ThreadA to exit the 
> accept call with an error, the correct way is for ThreadB to call close on 
> the socket that ThreadA is waiting on in accept.
> On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA 
> fails and returns EBADF.
> On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in 
> accept until there is an incoming connection. That accept returns 
> successfully. However subsequent accept calls on the same socket return EBADF.
> The Linux behaviour is fundamentally broken in three places:
> # Allowing shutdown to succeed on an unconnected socket is incorrect.  
> # Returning a successful accept on a closed file descriptor is incorrect, 
> especially as future accept calls on the same socket fail.
> # Once shutdown has been called on the socket, calling close on the socket 
> fails with EBADF. That is incorrect, shutdown should just prevent further IO 
> on the socket, it should not close it.
> The real issue though is that there's no single way of doing this that works 
> on both Solaris and Linux, there will need to be platform-specific code in 
> Hadoop to cater for the Linux brokenness. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to