getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value
--------------------------------------------------------------------------------

                 Key: HADOOP-4483
                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.1
         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
            Reporter: Ahad Rana
            Priority: Critical


The getBlockArray method in DatanodeDescriptor.java should honor the passed in 
maxblocks parameter. In its current form it passed in an array sized to 
min(maxblocks,blocks.size()) into the Collections.toArray method. As the 
javadoc for Collections.toArray indicates, the toArray method may discard the 
passed in array (and allocate a new array) if the number of elements returned 
by the iterator exceeds the size of the passed in array. As a result, the 
flawed implementation of this method would return all the invalid blocks for a 
data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE 
command to the DataNode with an excessively large number of blocks. This 
INVALIDATE command, in turn, could potentially take a very long time to process 
at the DataNode, and since DatanodeCommand(s) are processed in between 
heartbeats at the DataNode, this would trigger the NameNode to consider the 
DataNode to be offline / unresponsive (due to a lack of heartbeats). 

In our use-case at CommonCrawl.org, we regularly do large scale hdfs file 
deletions after certain stages of our map-reduce pipeline. These deletes would 
make certain DataNode(s) unresponsive, and thus impact the cluster's capability 
to properly balance file-system reads / writes across the whole available 
cluster. This problem only surfaced once we migrated from our 16.2 deployment 
to the current 18.1 release. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to