[ 
https://issues.apache.org/jira/browse/HDFS-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216172#comment-17216172
 ] 

Lisheng Sun commented on HDFS-15605:
------------------------------------

Hi [~LiJinglun] 

Sorry,I relay to your mesage now.

I understand that your improvment is that datatnode state completely depends on 
the namenode. But there are some problems which is why i did not choose to 
obtain the datanode status from the namenode.
 # The datanode status in namenode is not real time and the default is update 
the status without a heartbeat for 10 mins. At the  same time there will be a 
delay in periodic calls to getDatanodeReport PRC. So the client may not get 
correct status of the datanode. 
In this cycle, all dfssinputstreams of the same client will read the dead node 
that has not be updated status in time. The delay is unacceptable for your 
online hhase cluster?
 #  getDatanodeReport is a heavy call, and big clueter presssure on namnode.

 

Can we solve the problem as you said by adjusting the elimation strategy on the 
client?  When datanode dead is found first time or continuous batch of nodes, 
it need to be confirmed again. 

 

> DeadNodeDetector supports getting deadnode from NameNode.
> ---------------------------------------------------------
>
>                 Key: HDFS-15605
>                 URL: https://issues.apache.org/jira/browse/HDFS-15605
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15605.001.patch, HDFS-15605.002.patch, 
> HDFS-15605.003.patch
>
>
> When we are using DeadNodeDetector, sometimes it marks too many nodes as dead 
> and cause the read failures. The DeadNodeDetector assumes all the 
> getDatanodeInfo rpcs failed to return in time are dead nodes. But actually 
> not. A client side error or a slow rpc in DataNode might be marked as dead 
> too. For example the client side delay of the rpcThreadPool might cause the 
> getDatanodeInfo rpcs timeout and adding many datanodes to the dead list.
> We have a simple improvement for this: the NameNode already knows which 
> datanodes are dead. So just update the dead list from NameNode using 
> DFSClient.datanodeReport().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to