[ 
https://issues.apache.org/jira/browse/HADOOP-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242395#comment-13242395
 ] 

Daryn Sharp commented on HADOOP-8221:
-------------------------------------

Yes, I should have commented that {{throwIOExceptionFromConnection}} is wrong.  
Http error codes do not generate an exception.  If an exception occurs during 
the connect, something seriously went wrong.  Ie. connect error, could send 
request, SSL negotiation failed, etc.  Thus the client is left in a blocking 
read waiting for a response that will never come.  The methods like 
{{getInputStream}} read the response code and headers.

On a side note, it's also odd that the DN's jetty doesn't have a timeout while 
waiting for a request.  Maybe it does, but when the DN is getting jammed the 
timeout isn't kicking in.  To clarify for others, we are addressing three 
problems:
# Socket is left dangling because the remote host closed the socket.  Not sure 
why unless there's a linux kernel bug (unlikely?) or the tcp FIN packets were 
somehow lost.
# DN accepts the connection, but never sends a response.
# DN host becomes a "zombie".  The host is inexplicably hung such that you 
can't even ssh or console into the box.  Sockets connect into the listen 
backlog, but are never accepted and processed.

I'll work on a patch for 1.x.
                
> Hftp connections do not have a timeout
> --------------------------------------
>
>                 Key: HADOOP-8221
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8221
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-8221.patch
>
>
> Hftp connections do not have read timeouts.  This leads to indefinitely hung 
> sockets when there is a network outage during which time the remote host 
> closed the socket.
> This may also affect WebHdfs, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to