[
https://issues.apache.org/jira/browse/HDFS-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893874#comment-16893874
]
qiang Liu commented on HDFS-14669:
----------------------------------
thanks [~ayushtkn] for your review, I do recheck the error, it is timeout
something like this
{panel:title=the timeout error}
java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread
diagnostics:
Timestamp: 2019-07-26 02:45:04,335
{panel}
I think this the missleading exception is caused by checking scan restult using
GenericTestUtils.waitFor
{code:java}
GenericTestUtils.waitFor(() -> {
try {
bpid = cluster.getNamesystem(1).getBlockPoolId();
verifyStats(bp1Files, 0, 0, 0, 0, 0, 0);
bpid = cluster.getNamesystem(3).getBlockPoolId();
verifyStats(bp2Files, 0, 0, 0, 0, 0, 0);
} catch (AssertionError ex) {
return false;
}
return true;
}, 50, 2000);
{code}
in addtion, I really wonder is this kind of chek is necessary, meaning is there
a chance that file block not really got created after fs.create() has returned
any way, even there is a chance , a rescan should be added a before recheck the
scan result
I think a v3 patch is needed, but not sure to remove the
GenericTestUtils.waitFor() or add scanner.reconcile() before every check.
> TestDirectoryScanner#testDirectoryScannerInFederatedCluster fails
> intermittently in trunk
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-14669
> URL: https://issues.apache.org/jira/browse/HDFS-14669
> Project: Hadoop HDFS
> Issue Type: Test
> Components: datanode
> Affects Versions: 3.2.0
> Environment: env free
> Reporter: qiang Liu
> Assignee: qiang Liu
> Priority: Minor
> Labels: scanner, test
> Attachments: HDFS-14669-trunk-001.patch, HDFS-14669-trunk.002.patch
>
>
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner#testDirectoryScannerInFederatedCluster
> radomlly Failes because of write files of the same name, meaning intent to
> write 2 files but 2 files are the same name, witch cause a race condition of
> datanode delete block and the scan action count block.
>
> Ref ::
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1207/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testDirectoryScannerInFederatedCluster/]
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]