[jira] [Commented] (HDFS-5464) Simplify block report diff calculation

Colin Patrick McCabe (JIRA) Tue, 15 Jul 2014 14:51:33 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062703#comment-14062703
 ]


Colin Patrick McCabe commented on HDFS-5464:
--------------------------------------------

{code}
     // collect blocks that have not been reported
-    // all of them are next to the delimiter
-    Iterator<BlockInfo> it = storageInfo.new 
BlockIterator(delimiter.getNext(0));
-    while(it.hasNext())
-      toRemove.add(it.next());
-    storageInfo.removeBlock(delimiter);
+    for(BlockInfo b : storageInfo) {
+      final long n = b.getNumBytes();
+      if (n < 0) {
+        // reset the length of visited block 
+        b.setNumBytes(-n - 1);
+      } else {
+        toRemove.add(b);
+      }
+    }
{code}

Previously, we only ended up looping over the blocks that were not reported.  
Now, with this change, we'll loop over all blocks in the DataNodeDescriptor.  
Do you agree?

This seems like it will be much slower.  Imagine a datanode with 500,000 
blocks, none of which have been removed since the previous block report.  
Previously, this loop would do nothing.  Now, with this change, we'll be 
looping over the full 500,000 blocks again.

{code}
-  \@Test
-  public void testAddStorage() throws Exception {
...
-  \@Test
-  public void testReplaceStorageIfDifferetnOneAlreadyExistedFromSameDataNode() 
throws Exception {
{code}

I understand removing {{testBlockListMoveToHead}}, but why remove these other 
tests?

> Simplify block report diff calculation
> --------------------------------------
>
>                 Key: HDFS-5464
>                 URL: https://issues.apache.org/jira/browse/HDFS-5464
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>            Priority: Minor
>         Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
> h5464_20131105c.patch, h5464_20140715.patch
>
>
> The current calculation in BlockManager.reportDiff(..) is unnecessarily 
> complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5464) Simplify block report diff calculation

Reply via email to