[
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Binglin Chang updated HDFS-4273:
--------------------------------
Description:
Follow issues in DFSInputStream is address in this jira:
1. read may not retry enough in some cases cause early failure
Assume the following call logic
{noformat}
readWithStrategy()
-> blockSeekTo()
-> readBuffer()
-> reader.doRead()
-> seekToNewSource() add currentNode to deadnode, wish to get a different
datanode
-> blockSeekTo()
-> chooseDataNode()
-> block missing, clear deadNodes and pick the currentNode again
seekToNewSource() return false
readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception, and may fail the read call before tried
MaxBlockAcquireFailures.
{noformat}
2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race
condition, it cleared to 0 when it is still used by other thread. So it is
possible that some read thread may never quit.
3.
was:
Assume the following call logic
{noformat}
readWithStrategy()
-> blockSeekTo()
-> readBuffer()
-> reader.doRead()
-> seekToNewSource() add currentNode to deadnode, wish to get a different
datanode
-> blockSeekTo()
-> chooseDataNode()
-> block missing, clear deadNodes and pick the currentNode again
seekToNewSource() return false
readBuffer() re-throw the exception quit loop
readWithStrategy() got the exception, and may fail the read call before tried
MaxBlockAcquireFailures.
{noformat}
some issues of the logic:
1. seekToNewSource() logic is broken because it may clear deadNodes in the
middle.
2. the variable "int retries=2" in readWithStrategy seems have conflict with
MaxBlockAcquireFailures, should it be removed?
> Fix some issue in DFSInputstream
> --------------------------------
>
> Key: HDFS-4273
> URL: https://issues.apache.org/jira/browse/HDFS-4273
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.2-alpha
> Reporter: Binglin Chang
> Assignee: Binglin Chang
> Priority: Minor
> Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch,
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch,
> HDFS-4273.v7.patch, TestDFSInputStream.java
>
>
> Follow issues in DFSInputStream is address in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat}
> readWithStrategy()
> -> blockSeekTo()
> -> readBuffer()
> -> reader.doRead()
> -> seekToNewSource() add currentNode to deadnode, wish to get a
> different datanode
> -> blockSeekTo()
> -> chooseDataNode()
> -> block missing, clear deadNodes and pick the currentNode again
> seekToNewSource() return false
> readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception, and may fail the read call before
> tried MaxBlockAcquireFailures.
> {noformat}
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race
> condition, it cleared to 0 when it is still used by other thread. So it is
> possible that some read thread may never quit.
> 3.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)