[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-4273:
--------------------------------

    Description: 
Following issues in DFSInputStream are addressed in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different 
datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried 
MaxBlockAcquireFailures.
{noformat} 

2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
condition, it is cleared to 0 when it is still used by other thread. So it is 
possible that  some read thread may never quit. Change failures to local 
variable solve this issue.

3. If local datanode is added to deadNodes, it will not be removed from 
deadNodes if DN is back alive. We need a way to remove local datanode from 
deadNodes when the local datanode is become live.

  was:
Follow issues in DFSInputStream is address in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat} 
readWithStrategy()
  -> blockSeekTo()
  -> readBuffer()
     -> reader.doRead()
     -> seekToNewSource() add currentNode to deadnode, wish to get a different 
datanode
        -> blockSeekTo()
           -> chooseDataNode()
              -> block missing, clear deadNodes and pick the currentNode again
        seekToNewSource() return false
     readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception,  and may fail the read call before tried 
MaxBlockAcquireFailures.
{noformat} 

2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
condition, it cleared to 0 when it is still used by other thread. So it is 
possible that  some read thread may never quit.

3. If local datanode is added to deadNodes, it will not be removed from 
deadNodes if DN is back alive. We need a way to remove local datanode from 
deadNodes when the local datanode is become live.


> Fix some issue in DFSInputstream
> --------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.2-alpha
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, 
> HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Following issues in DFSInputStream are addressed in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
> condition, it is cleared to 0 when it is still used by other thread. So it is 
> possible that  some read thread may never quit. Change failures to local 
> variable solve this issue.
> 3. If local datanode is added to deadNodes, it will not be removed from 
> deadNodes if DN is back alive. We need a way to remove local datanode from 
> deadNodes when the local datanode is become live.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to