[
https://issues.apache.org/jira/browse/HDFS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins updated HDFS-1562:
------------------------------
Attachment: hdfs-1562-3.patch
Hey Matt,
Thanks for reviewing! Updated patch attached.
* Addresses HDFS-1828 by making waitForReplication check for exact values
* Added a comment by each config option being set with rationale
* Folds all utility methods into DFSTestUtil. I used the NameNodeAdatper for
waitForReplication since it uses protected methods. This method is needed in
addition to waitReplication because it checks for specific values of
neededReplications not exposed via the FileSystem API (the test is more
fine-grain).
* Good point WRT waitForCorruptReplicas. The test actually has the opposite
problem, it explicitly attempts to report the corrupt replica from the client
(via file access) because the datanode checking takes so long (the
DataBlockScanner period is measured in hours, it doesn't execute during the
test runs). In the test, after the client reports the corrupt block to the
Namenode it immediately queries the namenode state to check that a corrupt
replica has been identified so it can wait for replication. After looping this
test however I discovered a problem with this approach too, sometimes the
client only accesses the non-corrupt block location and therefore doesn't
trigger the detection of the corrupt replica. The code for testing corrupt
replicas in TestDatanodeBlockScanner (restart the DN which will trigger block
scanning) looks sound, I refatored it out to a new method
(DFSTestUtil#waitCorruptReplicas) and used it here.
* Also refactored TestDatanodeBlockScanner to use waitReplication and new
methods waitCorruptReplicas and isBlockCorrupt.
* Removes TestDataNodeBlockScanner#corruptReplica in favor of
MiniDFSCluster#corruptReplica (same implementation)
I've looped the test using this patch and so far have seen no failures.
Thanks,
Eli
> Add rack policy tests
> ---------------------
>
> Key: HDFS-1562
> URL: https://issues.apache.org/jira/browse/HDFS-1562
> Project: Hadoop HDFS
> Issue Type: Test
> Components: name-node, test
> Affects Versions: 0.23.0
> Reporter: Eli Collins
> Assignee: Eli Collins
> Attachments: hdfs-1562-1.patch, hdfs-1562-2.patch, hdfs-1562-3.patch
>
>
> The existing replication tests (TestBlocksWithNotEnoughRacks,
> TestPendingReplication, TestOverReplicatedBlocks, TestReplicationPolicy,
> TestUnderReplicatedBlocks, and TestReplication) are missing tests for rack
> policy violations. This jira adds the following tests which I created when
> generating a new patch for HDFS-15.
> * Test that blocks that have a sufficient number of total replicas, but are
> not replicated cross rack, get replicated cross rack when a rack becomes
> available.
> * Test that new blocks for an underreplicated file will get replicated cross
> rack.
> * Mark a block as corrupt, test that when it is re-replicated that it is
> still replicated across racks.
> * Reduce the replication factor of a file, making sure that the only block
> that is across racks is not removed when deleting replicas.
> * Test that when a block is replicated because a replica is lost due to host
> failure the the rack policy is preserved.
> * Test that when the execss replicas of a block are reduced due to a node
> re-joining the cluster the rack policy is not violated.
> * Test that rack policy is still respected when blocks are replicated due to
> node decommissioning.
> * Test that rack policy is still respected when blocks are replicated due to
> node decommissioning, even when the blocks are over-replicated.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira