[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862875#comment-13862875
 ] 

LiuLei commented on HDFS-4273:
------------------------------

Hi, Binglin
We can add isLocalNodeDead attribute in DFSClient object, when one 
DFSInputStream object find the local datanode is dead, the DFSInputStream 
object set isLocalNodeDead attribute to true. We need one thread that detection 
whether the local datanode is live, if the local datanode is live, the thread 
set  isLocalNodeDead attribute to false.  When DFSInputStream object choose 
datanode, the DFSInputStream need to judge  isLocalNodeDead attribute.  

We can create another jira to discuss the question.


> Fix some issue in DFSInputstream
> --------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, 
> HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Following issues in DFSInputStream are addressed in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
> condition, it is cleared to 0 when it is still used by other thread. So it is 
> possible that  some read thread may never quit. Change failures to local 
> variable solve this issue.
> 3. If local datanode is added to deadNodes, it will not be removed from 
> deadNodes if DN is back alive. We need a way to remove local datanode from 
> deadNodes when the local datanode is become live.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to