[ 
https://issues.apache.org/jira/browse/HDFS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-1562:
------------------------------

    Attachment: hdfs-1562-3.patch

Hey Matt,

Thanks for reviewing! Updated patch attached.

* Addresses HDFS-1828 by making waitForReplication check for exact values
* Added a comment by each config option being set with rationale
* Folds all utility methods into DFSTestUtil. I used the NameNodeAdatper for 
waitForReplication since it uses protected methods. This method is needed in 
addition to waitReplication because it checks for specific values of 
neededReplications not exposed via the FileSystem API (the test is more 
fine-grain).
* Good point WRT waitForCorruptReplicas. The test actually has the opposite 
problem, it explicitly attempts to report the corrupt replica from the client 
(via file access) because the datanode checking takes so long (the 
DataBlockScanner period is measured in hours, it doesn't execute during the 
test runs). In the test, after the client reports the corrupt block to the 
Namenode it immediately queries the namenode state to check that a corrupt 
replica has been identified so it can wait for replication. After looping this 
test however I discovered a problem with this approach too, sometimes the 
client only accesses the non-corrupt block location and therefore doesn't 
trigger the detection of the corrupt replica. The code for testing corrupt 
replicas in TestDatanodeBlockScanner (restart the DN which will trigger block 
scanning) looks sound, I refatored it out to a new method 
(DFSTestUtil#waitCorruptReplicas) and used it here. 
* Also refactored TestDatanodeBlockScanner to use waitReplication and new 
methods waitCorruptReplicas and isBlockCorrupt. 
* Removes TestDataNodeBlockScanner#corruptReplica in favor of 
MiniDFSCluster#corruptReplica (same implementation)

I've looped the test using this patch and so far have seen no failures.

Thanks,
Eli

> Add rack policy tests
> ---------------------
>
>                 Key: HDFS-1562
>                 URL: https://issues.apache.org/jira/browse/HDFS-1562
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: name-node, test
>    Affects Versions: 0.23.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hdfs-1562-1.patch, hdfs-1562-2.patch, hdfs-1562-3.patch
>
>
> The existing replication tests (TestBlocksWithNotEnoughRacks, 
> TestPendingReplication, TestOverReplicatedBlocks, TestReplicationPolicy, 
> TestUnderReplicatedBlocks, and TestReplication) are missing tests for rack 
> policy violations.  This jira adds the following tests which I created when 
> generating a new patch for HDFS-15.
> * Test that blocks that have a sufficient number of total replicas, but are 
> not replicated cross rack, get replicated cross rack when a rack becomes 
> available.
> * Test that new blocks for an underreplicated file will get replicated cross 
> rack. 
> * Mark a block as corrupt, test that when it is re-replicated that it is 
> still replicated across racks.
> * Reduce the replication factor of a file, making sure that the only block 
> that is across racks is not removed when deleting replicas.
> * Test that when a block is replicated because a replica is lost due to host 
> failure the the rack policy is preserved.
> * Test that when the execss replicas of a block are reduced due to a node 
> re-joining the cluster the rack policy is not violated.
> * Test that rack policy is still respected when blocks are replicated due to 
> node decommissioning.
> * Test that rack policy is still respected when blocks are replicated due to 
> node decommissioning, even when the blocks are over-replicated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to