[
https://issues.apache.org/jira/browse/HDFS-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323712#comment-14323712
]
Yi Liu edited comment on HDFS-7740 at 2/17/15 6:32 AM:
-------------------------------------------------------
Sorry for the late update.
Add tests for the above 4 scenarios. To let the tests free control the
datanodes number and don't affect other tests, I use separate MiniDFSCluster
for them.
Some explanations to the 4 tests:
{quote}
Create file with 3 DNs up. Kill DN(0). Truncate file. Restart DN(0), make sure
the old replica is disregarded and replaced with the truncated one.
{quote}
For non copy-on-truncate, the new (truncated) block id is the same, but the GS
(GenerationStamp) should increase. In the test, I trigger block report for dn0
after it restarts, since the GS of replica for the last block is old on dn0, so
the reported last block from dn0 should be marked corrupt on nn and the
replicas of last block should decrease 1 on nn, then the truncated block will
be replicated to dn0. In the test, I check old replica (the block file and
block metatdata file) is removed and replaced with the new (truncated) one.
{quote}
Kill DN(1). Truncate within the same last block with copy-on-truncate. Restart
DN(1), verify replica consistency.
{quote}
For copy-on-truncate, new block is made with new block id and new GS. In the
test, I trigger block report for dn1 after it restarts. The replicas of the new
block is 2, and then it's replicated to dn1. In the test, I check new block
file is replicated in dn1, and old replica exists too because there is snapshot.
{quote}
Create a single block file with 3 replicas. Truncate mid of block and then
immediately restart 2 of the DNs. Check the files
{quote}
In the test, I restart dn0 and dn1 immediately after truncate, and check the
old replica is removed and replaced with the truncated one on dn0 and dn1.
{quote}
Same as before except completely shutting down 3 of the DNs but not restarting
them.
{quote}
In the test, I check the truncated block is always under construction after the
3 datanodes shutdown.
was (Author: hitliuyi):
Sorry for the late update.
Add tests for the above 4 scenarios. To let this tests free control the
datanodes number and don't affect other tests, I use separate MiniDFSCluster
for them.
Some explanations to the 4 tests:
{quote}
Create file with 3 DNs up. Kill DN(0). Truncate file. Restart DN(0), make sure
the old replica is disregarded and replaced with the truncated one.
{quote}
For non copy-on-truncate, the new (truncated) block id is the same, but the GS
(GenerationStamp) should increase. In the test, I trigger block report for dn0
after it restarts, since the GS of replica for the last block is old on dn0, so
the reported last block from dn0 should be marked corrupt on nn and the
replicas of last block should decrease 1 on nn, then the truncated block will
be replicated to dn0. In the test, I check old replica (the block file and
block metatdata file) is removed and replaced with the new (truncated) one.
{quote}
Kill DN(1). Truncate within the same last block with copy-on-truncate. Restart
DN(1), verify replica consistency.
{quote}
For copy-on-truncate, new block is made with new block id and new GS. In the
test, I trigger block report for dn1 after it restarts. The replicas of the new
block is 2, and then it's replicated to dn1. In the test, I check new block
file is replicated in dn1, and old replica exists too because there is snapshot.
{quote}
Create a single block file with 3 replicas. Truncate mid of block and then
immediately restart 2 of the DNs. Check the files
{quote}
In the test, I restart dn0 and dn1 immediately after truncate, and check the
old replica is removed and replaced with the truncated one on dn0 and dn1.
{quote}
Same as before except completely shutting down 3 of the DNs but not restarting
them.
{quote}
In the test, I check the truncated block is always under construction after the
3 datanodes shutdown.
> Test truncate with DataNodes restarting
> ---------------------------------------
>
> Key: HDFS-7740
> URL: https://issues.apache.org/jira/browse/HDFS-7740
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: test
> Affects Versions: 2.7.0
> Reporter: Konstantin Shvachko
> Assignee: Yi Liu
> Fix For: 2.7.0
>
> Attachments: HDFS-7740.001.patch
>
>
> Add a test case, which ensures replica consistency when DNs are failing and
> restarting.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)