[ 
https://issues.apache.org/jira/browse/HDFS-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893874#comment-16893874
 ] 

qiang Liu commented on HDFS-14669:
----------------------------------

thanks [~ayushtkn] for your review, I do recheck the error, it is timeout 
something like this
{panel:title=the timeout error}
java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread 
diagnostics:
Timestamp: 2019-07-26 02:45:04,335
{panel}
I think this the missleading exception is caused by checking scan restult using 
GenericTestUtils.waitFor
{code:java}
GenericTestUtils.waitFor(() -> {
  try {
    bpid = cluster.getNamesystem(1).getBlockPoolId();
    verifyStats(bp1Files, 0, 0, 0, 0, 0, 0);
    bpid = cluster.getNamesystem(3).getBlockPoolId();
    verifyStats(bp2Files, 0, 0, 0, 0, 0, 0);
  } catch (AssertionError ex) {
    return false;
  }

  return true;
}, 50, 2000);
{code}
in addtion, I really wonder is this kind of chek is necessary, meaning is there 
a chance that file block not really got created after fs.create() has returned

any way, even there is a chance , a rescan should be added a before recheck the 
scan result

I think a v3 patch is needed, but not sure to remove the 
GenericTestUtils.waitFor() or add scanner.reconcile() before every check.

> TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails 
> intermittently in trunk
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14669
>                 URL: https://issues.apache.org/jira/browse/HDFS-14669
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: datanode
>    Affects Versions: 3.2.0
>         Environment: env free
>            Reporter: qiang Liu
>            Assignee: qiang Liu
>            Priority: Minor
>              Labels: scanner, test
>         Attachments: HDFS-14669-trunk-001.patch, HDFS-14669-trunk.002.patch
>
>
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner#testDirectoryScannerInFederatedCluster
>  radomlly Failes because of write files of the same name, meaning intent to 
> write 2 files but  2 files are the same name, witch cause a race condition of 
> datanode delete block and the scan action count block.
>  
> Ref :: 
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1207/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testDirectoryScannerInFederatedCluster/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to