[ 
https://issues.apache.org/jira/browse/HDFS-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10293:
---------------------------------
    Attachment: HDFS-10293.000.patch

The code is as following:


{code}
  static int readAll(FSDataInputStream in, byte[] buf) throws IOException {
    int readLen = 0;
    int ret;
    while ((ret = in.read(buf, readLen, buf.length - readLen)) >= 0 &&
        readLen <= buf.length) {
      readLen += ret;
    }
    return readLen;
  }
{code}

If the {{readLen}} equals to {{buf.length}}, then {{buf.length - readLen}} will 
be zero, and {{in.read()}} will simply returns zero without reading from the 
stream. This case, no exception will be thrown, and the code is stuck in the 
while-loop.

One possible fix is to strict the condition as {{ret = in.read(buf, readLen, 
buf.length - readLen)) > 0 && readLen < buf.length}}. A probable better fix is 
to use the {{IOUtils.readFully()}}, which will throw an IOException if it reads 
premature EOF from inputStream, see the v0 patch.

> StripedFileTestUtil#readAll flaky
> ---------------------------------
>
>                 Key: HDFS-10293
>                 URL: https://issues.apache.org/jira/browse/HDFS-10293
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding, test
>    Affects Versions: 3.0.0
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10293.000.patch
>
>
> The flaky test helper method cause several UT test failing intermittently. 
> For example, the 
> {{TestDFSStripedOutputStreamWithFailure#testAddBlockWhenNoSufficientParityNumOfNodes}}
>  timed out in a recent run (see 
> [exception|https://builds.apache.org/job/PreCommit-HDFS-Build/15158/testReport/org.apache.hadoop.hdfs/TestDFSStripedOutputStreamWithFailure/testAddBlockWhenNoSufficientParityNumOfNodes/]),
>  which can be easily reproduced locally.
> Debugging at the code, chances are that the helper method is stuck in an 
> infinite loop. We need a fix to make the test robust.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to