[jira] [Commented] (HDFS-6022) Moving deadNodes from being thread local. Improving dead datanode handling in DFSClient

Hadoop QA (JIRA) Thu, 27 Feb 2014 18:18:11 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915366#comment-13915366
 ]


Hadoop QA commented on HDFS-6022:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12631647/HADOOP-6022.patch
  against trunk revision .

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6263//console

This message is automatically generated.

> Moving deadNodes from being thread local. Improving dead datanode handling in 
> DFSClient 
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6022
>                 URL: https://issues.apache.org/jira/browse/HDFS-6022
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 0.23.9, 0.23.10, 2.2.0, 2.3.0
>            Reporter: Jack Levin
>              Labels: patch
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: HADOOP-6022.patch, HADOOP-6022.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> This patch solves an issue of deadNodes list being thread local.  deadNodes 
> list is created by DFSClient when some problems with write/reading, or 
> contacting datanode exist.  The problem is that deadNodes is not visible to 
> other DFSInputStream threads, hence every DFSInputStream ends up building its 
> own deadNodes.  This affect performance of DFSClient to a large degree 
> especially when a datanode goes completely offline (there is a tcp connect 
> delay experienced by all DFSInputStream threads affecting performance of the 
> whole cluster).
> This patch moves deadNodes to be global in DFSClient class so that as soon as 
> a single DFSInputStream thread reports a dead datanode, all other 
> DFSInputStream threads are informed, negating the need to create their own 
> independent lists (concurrent Map really). 
> Further, a global deadNodes health check manager thread (DeadNodeVerifier) is 
> created to verify all dead datanodes every 5 seconds, and remove the same 
> list as soon as it is up.  That thread under normal conditions (deadNodes 
> empty) would be sleeping.  If deadNodes is not empty, the thread will attempt 
> to open tcp connection every 5 seconds to affected datanodes.
> This patch has a test (TestDFSClientDeadNodes) that is quite simple, since 
> the deadNodes creation is not affected by the patch, we only test datanode 
> removal from deadNodes by the health check manager thread.  Test will create 
> a file in dfs minicluster, read from the same file rapidly, cause datanode to 
> restart, and test is the health check manager thread does the right thing, 
> removing the alive datanode from the global deadNodes list.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-6022) Moving deadNodes from being thread local. Improving dead datanode handling in DFSClient

Reply via email to