[ 
https://issues.apache.org/jira/browse/HDFS-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717797#comment-13717797
 ] 

Chris Nauroth commented on HDFS-5020:
-------------------------------------

Jing, the patch is looking really good.  Here is feedback on a few minor things:
{code}
    // for a retry request (of DataProtocol#blockReceivedAndDeleted with 
    // RECEIVED_BLOCK), we currently also decrease the approximate number. 
{code}
Minor nitpick: I think this was supposed say 
"DatanodeProtocol#blockReceivedAndDeleted".
{code}
  /**
   * An object that contains information about a block that 
   * is being replicated. It records the timestamp when the 
   * system started replicating the most recent copy of this
   * block. It also records the number of replication
   * requests that are in progress.
   */
  static class PendingBlockInfo {
{code}
Let's update the comment above to state that it tracks the individual data 
nodes to enforce idempotence (not just "number of replication requests").

{code}
    PendingBlockInfo(DatanodeDescriptor[] targets) {
      this.timeStamp = now();
      this.targets = targets == null ? new ArrayList<DatanodeDescriptor>(0)
          : new ArrayList<DatanodeDescriptor>(Arrays.asList(targets));
    }
{code}
Perhaps consider switching to the default constructor of {{ArrayList}}?  
Initializing with capacity 0 probably guarantees a reallocation will happen on 
a subsequent call to {{incrementReplicas}}.  The default constructor uses 
capacity 10, which is higher than typical settings for replication, so it would 
probably prevent reallocations.
{code}
    MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(
        DATANODE_COUNT).build();
    cluster.waitActive();
    
    DistributedFileSystem hdfs = cluster.getFileSystem();
    FSNamesystem fsn = cluster.getNamesystem();
    BlockManager blkManager = fsn.getBlockManager();
    
    try { 
{code}
{{MiniDFSCluster#waitAcitve}} and {{MiniDFSCluster#getFileSystem}} can throw 
{{IOException}}.  Right now, these calls are outside the try block, so if they 
do throw, then we wouldn't run the finally block to do a proper cleanup.  I 
recommend moving these 2 calls into the try block.
                
> Make DatanodeProtocol#blockReceivedAndDeleted idempotent
> --------------------------------------------------------
>
>                 Key: HDFS-5020
>                 URL: https://issues.apache.org/jira/browse/HDFS-5020
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-5020.001.patch, HDFS-5020.002.patch
>
>
> Currently DatanodeProtocol#blockReceivedAndDeleted is not idempotent because 
> pending replication maps in NN only tracks the number of the pending 
> replicas, and a retry request about a received block can cause the pending 
> number decreased more than once. We can make blockReceivedAndDeleted 
> idempotent by tracking corresponding datanodes in the pending replication 
> map. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to