[
https://issues.apache.org/jira/browse/HADOOP-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042151#comment-15042151
]
Colin Patrick McCabe edited comment on HADOOP-12487 at 12/4/15 8:10 PM:
------------------------------------------------------------------------
It looks like the patch can't be applied to trunk. Can you update it?
was (Author: cmccabe):
It looks like the patch can't be applied.
> DomainSocket.close() assumes incorrect Linux behaviour
> ------------------------------------------------------
>
> Key: HADOOP-12487
> URL: https://issues.apache.org/jira/browse/HADOOP-12487
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: net
> Affects Versions: 2.7.1
> Environment: Linux Solaris
> Reporter: Alan Burlison
> Assignee: Alan Burlison
> Attachments: HADOOP-12487.001.patch, HADOOP-12487.002.patch,
> HADOOP-12487.003.patch, HADOOP-12487.004.patch, shutdown.c
>
>
> I'm getting a test failure in TestDomainSocket.java, in the
> testSocketAcceptAndClose test. That test creates a socket which one thread
> waits on in DomainSocket.accept() whilst a second thread sleeps for a short
> time before closing the same socket with DomainSocket.close().
> DomainSocket.close() first calls shutdown0() on the socket before closing
> close0() - both those are thin wrappers around the corresponding libc socket
> calls. DomainSocket.close() contains the following comment, explaining the
> logic involved:
> {code}
> // Calling shutdown on the socket will interrupt blocking system
> // calls like accept, write, and read that are going on in a
> // different thread.
> {code}
> Unfortunately that relies on non-standards-compliant Linux behaviour. I've
> written a simple C test case that replicates the scenario above:
> # ThreadA opens, binds, listens and accepts on a socket, waiting for
> connections.
> # Some time later ThreadB calls shutdown on the socket ThreadA is waiting in
> accept on.
> Here is what happens:
> On Linux, the shutdown call in ThreadB succeeds and the accept call in
> ThreadA returns with EINVAL.
> On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA
> continues to wait in accept.
> Relevant POSIX manpages:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html
> The POSIX shutdown manpage says:
> "The shutdown() function shall cause all or part of a full-duplex connection
> on the socket associated with the file descriptor socket to be shut down."
> ...
> "\[ENOTCONN] The socket is not connected."
> Page 229 & 303 of "UNIX System V Network Programming" say:
> "shutdown can only be called on sockets that have been previously connected"
> "The socket \[passed to accept that] fd refers to does not participate in the
> connection. It remains available to receive further connect indications"
> That is pretty clear, sockets being waited on with accept are not connected
> by definition. Nor is it the accept socket connected when a client connects
> to it, it is the socket returned by accept that is connected to the client.
> Therefore the Solaris behaviour of failing the shutdown call is correct.
> In order to get the required behaviour of ThreadB causing ThreadA to exit the
> accept call with an error, the correct way is for ThreadB to call close on
> the socket that ThreadA is waiting on in accept.
> On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA
> fails and returns EBADF.
> On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in
> accept until there is an incoming connection. That accept returns
> successfully. However subsequent accept calls on the same socket return EBADF.
> The Linux behaviour is fundamentally broken in three places:
> # Allowing shutdown to succeed on an unconnected socket is incorrect.
> # Returning a successful accept on a closed file descriptor is incorrect,
> especially as future accept calls on the same socket fail.
> # Once shutdown has been called on the socket, calling close on the socket
> fails with EBADF. That is incorrect, shutdown should just prevent further IO
> on the socket, it should not close it.
> The real issue though is that there's no single way of doing this that works
> on both Solaris and Linux, there will need to be platform-specific code in
> Hadoop to cater for the Linux brokenness.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)