[
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Masatake Iwasaki updated HDFS-4273:
-----------------------------------
Resolution: Won't Fix
Target Version/s: 2.0.3-alpha, 3.0.0 (was: 3.0.0, 2.0.3-alpha)
Status: Resolved (was: Patch Available)
I looked into tests added by .v8 patch.
{{TestDFSClientRetries#testDFSInputStreamReadRetryTime}} added by .v8 patch.
The test expects client to always retry up to maxBlockAcquireFailures but it is
not true. Client does not retry to same node on ChecksumException.
{{seekToNewSource}} returning 0 means there is no more possible datanodes and
it is right to give up even if retry count (failures) does not reache to max.
{{testSeekToNewSourcePastFileSize}} and {{testNegativeSeekToNewSource}} added
to {{TestSeekBug}} calls {{FSDataInpuStream#seekToNewSource}} just after
opening file. This causes NullPointerException because currentNode is not set
in DFSInputstream. Tests passed after fixing this.
I close this issue as Won't fix.
> Fix some issue in DFSInputstream
> --------------------------------
>
> Key: HDFS-4273
> URL: https://issues.apache.org/jira/browse/HDFS-4273
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.2-alpha
> Reporter: Binglin Chang
> Assignee: Binglin Chang
> Priority: Minor
> Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch,
> HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch,
> HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java
>
>
> Following issues in DFSInputStream are addressed in this jira:
> 1. read may not retry enough in some cases cause early failure
> Assume the following call logic
> {noformat}
> readWithStrategy()
> -> blockSeekTo()
> -> readBuffer()
> -> reader.doRead()
> -> seekToNewSource() add currentNode to deadnode, wish to get a
> different datanode
> -> blockSeekTo()
> -> chooseDataNode()
> -> block missing, clear deadNodes and pick the currentNode again
> seekToNewSource() return false
> readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception, and may fail the read call before
> tried MaxBlockAcquireFailures.
> {noformat}
> 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race
> condition, it is cleared to 0 when it is still used by other thread. So it is
> possible that some read thread may never quit. Change failures to local
> variable solve this issue.
> 3. If local datanode is added to deadNodes, it will not be removed from
> deadNodes if DN is back alive. We need a way to remove local datanode from
> deadNodes when the local datanode is become live.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)