[ 
https://issues.apache.org/jira/browse/HDFS-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918544#comment-13918544
 ] 

jack levin commented on HDFS-6022:
----------------------------------

Colin, thanks for the comments.  12631558/HADOOP-6022.patch is the trunk patch.

>This seems unnecessary to me. We don't need to be constantly trying to connect 
>to dead nodes. Just expire >them from a Guava cache after a few minutes, like 
>we do in DFSOutputStream now.

I think the problem with that will be as follows.  Suppose deadNodes list 
expires a datanode which is not yet up, (perhaps its down due to power outage 
for extended period of time),  all DFSClient input streams will attempt to open 
tcp connection to dead datanode again causing a problem, it is specifically 
problematic for HBASE given that 60 second tcp timeout would cause things like 
connections/thread pileups and downtime for HBASE.  DeadNodeVerifier assures 
that that does not happen. Guava cache would work, but in our environment it 
would mean instability in HBASE while trying to access blocks in 
semi-permanently dead DN.

I will move deadNodes to ClientContext.

Thanks.

-Jack

> Moving deadNodes from being thread local. Improving dead datanode handling in 
> DFSClient 
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6022
>                 URL: https://issues.apache.org/jira/browse/HDFS-6022
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 0.23.9, 0.23.10, 2.2.0, 2.3.0
>            Reporter: Jack Levin
>              Labels: patch
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HADOOP-6022.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> This patch solves an issue of deadNodes list being thread local.  deadNodes 
> list is created by DFSClient when some problems with write/reading, or 
> contacting datanode exist.  The problem is that deadNodes is not visible to 
> other DFSInputStream threads, hence every DFSInputStream ends up building its 
> own deadNodes.  This affect performance of DFSClient to a large degree 
> especially when a datanode goes completely offline (there is a tcp connect 
> delay experienced by all DFSInputStream threads affecting performance of the 
> whole cluster).
> This patch moves deadNodes to be global in DFSClient class so that as soon as 
> a single DFSInputStream thread reports a dead datanode, all other 
> DFSInputStream threads are informed, negating the need to create their own 
> independent lists (concurrent Map really). 
> Further, a global deadNodes health check manager thread (DeadNodeVerifier) is 
> created to verify all dead datanodes every 5 seconds, and remove the same 
> list as soon as it is up.  That thread under normal conditions (deadNodes 
> empty) would be sleeping.  If deadNodes is not empty, the thread will attempt 
> to open tcp connection every 5 seconds to affected datanodes.
> This patch has a test (TestDFSClientDeadNodes) that is quite simple, since 
> the deadNodes creation is not affected by the patch, we only test datanode 
> removal from deadNodes by the health check manager thread.  Test will create 
> a file in dfs minicluster, read from the same file rapidly, cause datanode to 
> restart, and test is the health check manager thread does the right thing, 
> removing the alive datanode from the global deadNodes list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to