[
https://issues.apache.org/jira/browse/HDFS-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102719#comment-16102719
]
Zhe Zhang commented on HDFS-11896:
----------------------------------
Thanks for the work [~brahmareddy].
I modified the code base to use non-simulated capacity, and added an
intermediate variable for the nonDFS used capacity after one DN is dead but
before it registers.
{code}
@Test
public void testNonDFSUsedONDeadNodeReReg() throws Exception {
Configuration conf = new HdfsConfiguration();
conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1);
conf.setInt(DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 1);
conf.setInt(DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY,
6 * 1000);
long capacity = 5000L;
long[] capacities = new long[]{ 4 * capacity, 4 * capacity };
try {
cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
long initialCapacity = cluster.getNamesystem(0).getCapacityTotal();
long nonDFS = cluster.getNamesystem(0).getNonDfsUsedSpace();
assertTrue(initialCapacity > 0);
DataNode dn1 = cluster.getDataNodes().get(0);
DataNode dn2 = cluster.getDataNodes().get(1);
final DatanodeDescriptor dn2Desc = cluster.getNamesystem(0)
.getBlockManager().getDatanodeManager()
.getDatanode(dn2.getDatanodeId());
dn1.setHeartbeatsDisabledForTests(true);
cluster.setDataNodeDead(dn1.getDatanodeId());
assertEquals("Capacity shouldn't include DeadNode", dn2Desc.getCapacity(),
cluster.getNamesystem(0).getCapacityTotal());
long nonDFSWithDeadDN = cluster.getNamesystem(0).getNonDfsUsedSpace();
assertEquals("NonDFS-used shouldn't include DeadNode",
dn2Desc.getNonDfsUsed(), nonDFSWithDeadDN);
// Wait for re-registration and heartbeat
dn1.setHeartbeatsDisabledForTests(false);
final DatanodeDescriptor dn1Desc = cluster.getNamesystem(0)
.getBlockManager().getDatanodeManager()
.getDatanode(dn1.getDatanodeId());
GenericTestUtils.waitFor(new Supplier<Boolean>() {
@Override
public Boolean get() {
return dn1Desc.isAlive && dn1Desc.isHeartbeatedSinceRegistration();
}
}, 100, 5000);
assertEquals("Capacity should be 0 after all DNs dead", initialCapacity,
cluster.getNamesystem(0).getCapacityTotal());
long nonDfsAfterReg = dn1Desc.getNonDfsUsed() + dn2Desc.getNonDfsUsed();
LOG.info("nonDFS=" + nonDFS + ",nonDFSWithDeadDN=" + nonDFSWithDeadDN +
",nonDfsAfterReg=" + nonDfsAfterReg);
assertEquals("NonDFS should include actual DN NonDFSUsed", nonDFS,
nonDfsAfterReg);
} finally {
if (cluster != null) {
cluster.shutdown();
}
}
}
{code}
Actually I don't see a clear difference between the behavior with and without
the patch. Did you observe that the non-dfsUsed number actually doubled? And
"doubled" here means 2x the amount of non-dfsUsed on the dead DN was added to
the Namesystem overall statics? If so do you mind updating the JIRA description
to be more accurate? Thanks.
{code}
// Without patch
nonDFS=884109852672,nonDFSWithDeadDN=442054926336,nonDfsAfterReg=884110409728
nonDFS=884111327232,nonDFSWithDeadDN=442055663616,nonDfsAfterReg=884112097280
nonDFS=884115406848,nonDFSWithDeadDN=442057703424,nonDfsAfterReg=884116340736
// With patch
nonDFS=884110589952,nonDFSWithDeadDN=442055311360,nonDfsAfterReg=884111163392
nonDFS=884116471808,nonDFSWithDeadDN=442058235904,nonDfsAfterReg=884115488768
nonDFS=884118700032,nonDFSWithDeadDN=442059350016,nonDfsAfterReg=884119486464
{code}
Minor: {{long[] capacities}} is unused.
> Non-dfsUsed will be doubled on dead node re-registration
> --------------------------------------------------------
>
> Key: HDFS-11896
> URL: https://issues.apache.org/jira/browse/HDFS-11896
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.3
> Reporter: Brahma Reddy Battula
> Assignee: Brahma Reddy Battula
> Priority: Blocker
> Labels: release-blocker
> Attachments: HDFS-11896-002.patch, HDFS-11896-003.patch,
> HDFS-11896-004.patch, HDFS-11896-005.patch, HDFS-11896-006.patch,
> HDFS-11896-007.patch, HDFS-11896-branch-2.7-001.patch,
> HDFS-11896-branch-2.7-002.patch, HDFS-11896-branch-2.7-003.patch,
> HDFS-11896-branch-2.7-004.patch, HDFS-11896-branch-2.7-005.patch,
> HDFS-11896.patch
>
>
> *Scenario:*
> i)Make you sure you've non-dfs data.
> ii) Stop Datanode
> iii) wait it becomes dead
> iv) now restart and check the non-dfs data
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]