[ 
https://issues.apache.org/jira/browse/HDFS-9384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9384:
--------------------------------
    Attachment: HDFS-9384.001.patch

I'm attaching a patch to fix the problem.  I've run with this patch multiple 
times in multiple environments, and the test failure no longer repros.

I left a comment in the patch explaining the problem in more detail.  I'm 
pasting it here for convenience:

{code}
          // The second request can be sent with Transfer-Encoding: chunked.
          // The Java HTTP client tends to split the headers and the chunked
          // body into separate writes, so the first read above likely only read
          // the headers.  We must fully consume the input to prevent a hang on
          // the client side.
{code}

{{TestWebHdfsTimeouts}} is an example of an existing similar test that already 
works correctly, because it follows the same strategy of fully consuming the 
input sent by the client.

> TestWebHdfsContentLength intermittently hangs and fails due to TCP 
> conversation mismatch between client and server.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9384
>                 URL: https://issues.apache.org/jira/browse/HDFS-9384
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Minor
>         Attachments: HDFS-9384.001.patch
>
>
> {{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
> background thread to simulate some WebHDFS server responses.  In some 
> environments (notably Windows), I have observed that the test can hang and 
> fail intermittently.  The root cause is that the server fails to fully 
> consume the client's input.  This causes a mismatch in the TCP conversation 
> state, and ultimately the client side hangs, then aborts after the 60-second 
> socket timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to