[ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511016#comment-13511016
 ] 

Binglin Chang commented on HDFS-4273:
-------------------------------------

bq. I do not think the DFSInputstream#read will be used concurrently.
Yes "read(byte [] buffer, int offset, int length)" will not be used 
concurrently. but according the classic semantics of pread "int read(long 
position, byte[] buffer, int offset, int length)", and no "synchronized" 
implication, I thought hbase may be using pread concurrently, right?

bq. Also, to clear deadNodes looks reasonable...
I'm not against clear deadNodes and retry, on the contrary, the current logic 
makes it may not retry enough and quit early.
                
> Problem in DFSInputStream read retry logic may cause early failure
> ------------------------------------------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: TestDFSInputStream.java
>
>
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
> middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with 
> MaxBlockAcquireFailures, should it be removed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to