[ 
https://issues.apache.org/jira/browse/HBASE-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964501#comment-14964501
 ] 

Heng Chen commented on HBASE-14648:
-----------------------------------

As current hdfs logic, if we connect dn failed 3 times,  this dn will be added 
into excluded list.
Relates code like below in {{DFSOutputStream.nextBlockOutputStream}}
{code}
....
      do {
        hasError = false;
        lastException.set(null);
        errorIndex = -1;
        success = false;

        DatanodeInfo[] excluded =
            excludedNodes.getAllPresent(excludedNodes.asMap().keySet())
            .keySet()
            .toArray(new DatanodeInfo[0]);
        block = oldBlock;
        lb = locateFollowingBlock(excluded.length > 0 ? excluded : null);
        block = lb.getBlock();
        block.setNumBytes(0);
        bytesSent = 0;
        accessToken = lb.getBlockToken();
        nodes = lb.getLocations();
        storageTypes = lb.getStorageTypes();

        //
        // Connect to first DataNode in the list.
        //
        success = createBlockOutputStream(nodes, storageTypes, 0L, false);

        if (!success) {
          DFSClient.LOG.info("Abandoning " + block);
          dfsClient.namenode.abandonBlock(block, fileId, src,
              dfsClient.clientName);
          block = null;
          DFSClient.LOG.info("Excluding datanode " + nodes[errorIndex]);
          excludedNodes.put(nodes[errorIndex], nodes[errorIndex]);
        }
      } while (!success && --count >= 0);
{code}
 
If the dn restart slowly,  but store insert fast enough,  all dn will be 
connected failed, and added into excluded list.

So we have two approaches,  one is to increase the retry number of connect dn 
and another is to increase the dn number.





> Reenable TestWALProcedureStoreOnHDFS#testWalRollOnLowReplication
> ----------------------------------------------------------------
>
>                 Key: HBASE-14648
>                 URL: https://issues.apache.org/jira/browse/HBASE-14648
>             Project: HBase
>          Issue Type: Sub-task
>          Components: test
>            Reporter: stack
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to