Eli Collins created HDFS-3931:
---------------------------------
Summary: TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is
broken
Key: HDFS-3931
URL: https://issues.apache.org/jira/browse/HDFS-3931
Project: Hadoop HDFS
Issue Type: Bug
Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Andy Isaacson
Per Andy's comment on HDFS-3902:
TestDatanodeBlockScanner still fails about 1/5 runs in
testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also
uncovered by HDFS-3828.
The failure scenario for this one is a bit more tricky. I think I've captured
the scenario below:
- The test corrupts 2/3 replicas.
- client reports a bad block.
- NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
- DN notices the incoming replica is corrupt and reports it as a bad block, but
does not inform the NN that re-replication failed.
- NN keeps the block on pendingReplications.
- BP scanner wakes up on both DNs with corrupt blocks, both report corruption.
NN reports both as duplicates, one from the client and one from the DN report
above.
since block is on pendingReplications, NN does not schedule another replication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira