[jira] [Updated] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

Binglin Chang (JIRA) Wed, 05 Dec 2012 07:06:58 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Binglin Chang updated HDFS-4273:
--------------------------------

    Attachment: TestDFSInputStream.java

I write a test to produce the scenario, and here is the log:
{noformat}
2012-12-05 22:55:15,135 WARN  hdfs.DFSClient 
(DFSInputStream.java:readBuffer(596)) - Found Checksum error for 
BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from 
127.0.0.1:50099 at 0
2012-12-05 22:55:15,136 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: 
/127.0.0.1:50105, bytes: 4128, op: HDFS_READ, cliID: 
DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: 
DS-91625336-192.168.0.101-50099-1354719314603, blockid: 
BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 
2925000
2012-12-05 22:55:15,136 INFO  hdfs.DFSClient 
(DFSInputStream.java:chooseDataNode(741)) - Could not obtain 
BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from any 
node: java.io.IOException: No live nodes contain current block. Will get new 
block locations from namenode and retry...
2012-12-05 22:55:15,136 WARN  hdfs.DFSClient 
(DFSInputStream.java:chooseDataNode(756)) - DFS chooseDataNode: got # 1 
IOException, will wait for 274.34891931868265 msec.
2012-12-05 22:55:15,413 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest: 
/127.0.0.1:50106, bytes: 4128, op: HDFS_READ, cliID: 
DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID: 
DS-91625336-192.168.0.101-50099-1354719314603, blockid: 
BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002, duration: 
283000
2012-12-05 22:55:15,414 INFO  hdfs.StateChange 
(FSNamesystem.java:reportBadBlocks(4761)) - *DIR* reportBadBlocks
2012-12-05 22:55:15,415 INFO  BlockStateChange 
(CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK 
NameSystem.addToCorruptReplicasMap: blk_-705068286766485620 added as corrupt on 
127.0.0.1:50099 by null because client machine reported it
2012-12-05 22:55:15,416 INFO  hdfs.TestClientReportBadBlock 
(TestDFSInputStream.java:testDFSInputStreamReadRetryTime(94)) - catch 
IOExceptionorg.apache.hadoop.fs.ChecksumException: Checksum error: /testFile at 
0 exp: 809972010 got: -1374622118
2012-12-05 22:55:15,431 INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1411)) - Shutting down the Mini HDFS Cluster
{noformat}

                
> Problem in DFSInputStream read retry logic may cause early failure
> ------------------------------------------------------------------
>
>                 Key: HDFS-4273
>                 URL: https://issues.apache.org/jira/browse/HDFS-4273
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>            Priority: Minor
>         Attachments: TestDFSInputStream.java
>
>
> Assume the following call logic
> {noformat} 
> readWithStrategy()
>   -> blockSeekTo()
>   -> readBuffer()
>      -> reader.doRead()
>      -> seekToNewSource() add currentNode to deadnode, wish to get a 
> different datanode
>         -> blockSeekTo()
>            -> chooseDataNode()
>               -> block missing, clear deadNodes and pick the currentNode again
>         seekToNewSource() return false
>      readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception,  and may fail the read call before 
> tried MaxBlockAcquireFailures.
> {noformat} 
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the 
> middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with 
> MaxBlockAcquireFailures, should it be removed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure

Reply via email to