[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062703#comment-14062703 ]
Colin Patrick McCabe commented on HDFS-5464: -------------------------------------------- {code} // collect blocks that have not been reported - // all of them are next to the delimiter - Iterator<BlockInfo> it = storageInfo.new BlockIterator(delimiter.getNext(0)); - while(it.hasNext()) - toRemove.add(it.next()); - storageInfo.removeBlock(delimiter); + for(BlockInfo b : storageInfo) { + final long n = b.getNumBytes(); + if (n < 0) { + // reset the length of visited block + b.setNumBytes(-n - 1); + } else { + toRemove.add(b); + } + } {code} Previously, we only ended up looping over the blocks that were not reported. Now, with this change, we'll loop over all blocks in the DataNodeDescriptor. Do you agree? This seems like it will be much slower. Imagine a datanode with 500,000 blocks, none of which have been removed since the previous block report. Previously, this loop would do nothing. Now, with this change, we'll be looping over the full 500,000 blocks again. {code} - \@Test - public void testAddStorage() throws Exception { ... - \@Test - public void testReplaceStorageIfDifferetnOneAlreadyExistedFromSameDataNode() throws Exception { {code} I understand removing {{testBlockListMoveToHead}}, but why remove these other tests? > Simplify block report diff calculation > -------------------------------------- > > Key: HDFS-5464 > URL: https://issues.apache.org/jira/browse/HDFS-5464 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Tsz Wo Nicholas Sze > Assignee: Tsz Wo Nicholas Sze > Priority: Minor > Attachments: h5464_20131105.patch, h5464_20131105b.patch, > h5464_20131105c.patch, h5464_20140715.patch > > > The current calculation in BlockManager.reportDiff(..) is unnecessarily > complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)