[ 
https://issues.apache.org/jira/browse/HDFS-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102719#comment-16102719
 ] 

Zhe Zhang commented on HDFS-11896:
----------------------------------

Thanks for the work [~brahmareddy].

I modified the code base to use non-simulated capacity, and added an 
intermediate variable for the nonDFS used capacity after one DN is dead but 
before it registers.
{code}
  @Test
  public void testNonDFSUsedONDeadNodeReReg() throws Exception {
    Configuration conf = new HdfsConfiguration();
    conf.setInt(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1);
    conf.setInt(DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY, 1);
    conf.setInt(DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY,
        6 * 1000);
    long capacity = 5000L;
    long[] capacities = new long[]{ 4 * capacity, 4 * capacity };
    try {
      cluster = new MiniDFSCluster.Builder(conf).numDataNodes(2).build();
      long initialCapacity = cluster.getNamesystem(0).getCapacityTotal();
      long nonDFS = cluster.getNamesystem(0).getNonDfsUsedSpace();
      assertTrue(initialCapacity > 0);
      DataNode dn1 = cluster.getDataNodes().get(0);
      DataNode dn2 = cluster.getDataNodes().get(1);
      final DatanodeDescriptor dn2Desc = cluster.getNamesystem(0)
          .getBlockManager().getDatanodeManager()
          .getDatanode(dn2.getDatanodeId());
      dn1.setHeartbeatsDisabledForTests(true);
      cluster.setDataNodeDead(dn1.getDatanodeId());
      assertEquals("Capacity shouldn't include DeadNode", dn2Desc.getCapacity(),
          cluster.getNamesystem(0).getCapacityTotal());
      long nonDFSWithDeadDN = cluster.getNamesystem(0).getNonDfsUsedSpace();
      assertEquals("NonDFS-used shouldn't include DeadNode",
          dn2Desc.getNonDfsUsed(), nonDFSWithDeadDN);
      // Wait for re-registration and heartbeat
      dn1.setHeartbeatsDisabledForTests(false);
      final DatanodeDescriptor dn1Desc = cluster.getNamesystem(0)
          .getBlockManager().getDatanodeManager()
          .getDatanode(dn1.getDatanodeId());
      GenericTestUtils.waitFor(new Supplier<Boolean>() {

        @Override
        public Boolean get() {
          return dn1Desc.isAlive && dn1Desc.isHeartbeatedSinceRegistration();
        }
      }, 100, 5000);
      assertEquals("Capacity should be 0 after all DNs dead", initialCapacity,
          cluster.getNamesystem(0).getCapacityTotal());
      long nonDfsAfterReg = dn1Desc.getNonDfsUsed() + dn2Desc.getNonDfsUsed();
      LOG.info("nonDFS=" + nonDFS + ",nonDFSWithDeadDN=" + nonDFSWithDeadDN +
              ",nonDfsAfterReg=" + nonDfsAfterReg);
      assertEquals("NonDFS should include actual DN NonDFSUsed", nonDFS,
          nonDfsAfterReg);
    } finally {
      if (cluster != null) {
        cluster.shutdown();
      }
    }
  }
{code}

Actually I don't see a clear difference between the behavior with and without 
the patch. Did you observe that the non-dfsUsed number actually doubled? And 
"doubled" here means 2x the amount of non-dfsUsed on the dead DN was added to 
the Namesystem overall statics? If so do you mind updating the JIRA description 
to be more accurate? Thanks.
{code}
// Without patch
nonDFS=884109852672,nonDFSWithDeadDN=442054926336,nonDfsAfterReg=884110409728
nonDFS=884111327232,nonDFSWithDeadDN=442055663616,nonDfsAfterReg=884112097280
nonDFS=884115406848,nonDFSWithDeadDN=442057703424,nonDfsAfterReg=884116340736

// With patch
nonDFS=884110589952,nonDFSWithDeadDN=442055311360,nonDfsAfterReg=884111163392
nonDFS=884116471808,nonDFSWithDeadDN=442058235904,nonDfsAfterReg=884115488768
nonDFS=884118700032,nonDFSWithDeadDN=442059350016,nonDfsAfterReg=884119486464
{code}

Minor: {{long[] capacities}} is unused.

> Non-dfsUsed will be doubled on dead node re-registration
> --------------------------------------------------------
>
>                 Key: HDFS-11896
>                 URL: https://issues.apache.org/jira/browse/HDFS-11896
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Blocker
>              Labels: release-blocker
>         Attachments: HDFS-11896-002.patch, HDFS-11896-003.patch, 
> HDFS-11896-004.patch, HDFS-11896-005.patch, HDFS-11896-006.patch, 
> HDFS-11896-007.patch, HDFS-11896-branch-2.7-001.patch, 
> HDFS-11896-branch-2.7-002.patch, HDFS-11896-branch-2.7-003.patch, 
> HDFS-11896-branch-2.7-004.patch, HDFS-11896-branch-2.7-005.patch, 
> HDFS-11896.patch
>
>
>  *Scenario:* 
> i)Make you sure you've non-dfs data.
> ii) Stop Datanode
> iii) wait it becomes dead
> iv) now restart and check the non-dfs data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to