[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972980#comment-16972980
 ] 

Yiqun Lin commented on HDFS-14648:
----------------------------------

The latest patch looks great, some more comments:

*ClientContext.java*
 We need a method to stop dead node detector thread and called this in 
DFSClient#close.
{code:java}
  /**
   * Close dead node detector thread.
   */
  public void stopDeadNodeDetectorThread() {
          if (deadNodeDetectorThr != null) {
                  deadNodeDetectorThr.interrupt();
              try {
                  deadNodeDetectorThr.join(3000);
              } catch (InterruptedException e) {
                  LOG.warn("Encountered exception while waiting to join on dead 
node detector thread.", e);
              }
            }
  }

.....
  public synchronized void close() throws IOException {
    if(clientRunning) {
      ...
      // close dead node detector thread
      clientContext.stopDeadNodeDetectorThread();
    }
  }
{code}

 *DFSInputStream.java*
 I haven't seen the call {{dfsClient.addNodeToDeadNodeDetector}} added in 
method {{createBlockReader}} under this class.

 *DFSStripedInputStream.java*
 Can we remove dfsClient.addNodeToDeadNodeDetector in this class? It's not 
expected enable dead node detection in the EC mode.
{code:java}
           fetchBlockAt(block.getStartOffset());
-          addToDeadNodes(dnInfo.info);
+          addToLocalDeadNodes(dnInfo.info);
+          dfsClient.addNodeToDeadNodeDetector(this, dnInfo.info);   <=== be 
removed
         }
{code}

Can we also fix this whitespace warning?
{noformat}
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDeadNodeDetection.java:113:
  public void testDeadNodeDetectionInMultipleDFSInputStream() 
{noformat}
Others looks good to me now.
  

> DeadNodeDetector basic model
> ----------------------------
>
>                 Key: HDFS-14648
>                 URL: https://issues.apache.org/jira/browse/HDFS-14648
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch, HDFS-14648.005.patch, 
> HDFS-14648.006.patch, HDFS-14648.007.patch, HDFS-14648.008.patch, 
> HDFS-14648.009.patch, HDFS-14648.010.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # When a DFSInputstream is opened, a BlockReader is opened. If some DataNode 
> of the block is found to inaccessible, put the DataNode into 
> DeadNodeDetector#deadnode.(HDFS-14649) will optimize this part. Because when 
> DataNode is not accessible, it is likely that the replica has been removed 
> from the DataNode.Therefore, it needs to be confirmed by re-probing and 
> requires a higher priority processing.
>  # DeadNodeDetector will periodically detect the Node in 
> DeadNodeDetector#deadnode, If the access is successful, the Node will be 
> moved from DeadNodeDetector#deadnode. Continuous detection of the dead node 
> is necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.
>  # DeadNodeDetector#dfsInputStreamNodes Record the DFSInputstream using 
> DataNode. When the DFSInputstream is closed, it will be moved from 
> DeadNodeDetector#dfsInputStreamNodes.
>  # Every time get the global deanode, update the DeadNodeDetector#deadnode. 
> The new DeadNodeDetector#deadnode Equals to the intersection of the old 
> DeadNodeDetector#deadnode and the Datanodes are by 
> DeadNodeDetector#dfsInputStreamNodes.
>  # DeadNodeDetector has a switch that is turned off by default. When it is 
> closed, each DFSInputstream still uses its own local deadnode.
>  # This feature has been used in the XIAOMI production environment for a long 
> time. Reduced hbase read stuck, due to node hangs.
>  # Just open the DeadNodeDetector switch and you can use it directly. No 
> other restrictions. Don't want to use DeadNodeDetector, just close it.
> {code:java}
> if (sharedDeadNodesEnabled && deadNodeDetector == null) {
>   deadNodeDetector = new DeadNodeDetector(name);
>   deadNodeDetectorThr = new Daemon(deadNodeDetector);
>   deadNodeDetectorThr.start();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to