[
https://issues.apache.org/jira/browse/HDFS-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Binglin Chang resolved HDFS-4271.
---------------------------------
Resolution: Duplicate
I didn't aware I have made 3 issues cause the bad internet connection
> Problem in DFSInputStream read retry logic may cause early failure
> ------------------------------------------------------------------
>
> Key: HDFS-4271
> URL: https://issues.apache.org/jira/browse/HDFS-4271
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Binglin Chang
> Assignee: Binglin Chang
> Priority: Minor
>
> Assume the following call logic
> {noformat}
> readWithStrategy()
> -> blockSeekTo()
> -> readBuffer()
> -> reader.doRead()
> -> seekToNewSource() add currentNode to deadnode, wish to get a
> different datanode
> -> blockSeekTo()
> -> chooseDataNode()
> -> block missing, clear deadNodes and pick the currentNode again
> seekToNewSource() return false
> readBuffer() re-throw the exception quit loop
> readWithStrategy() got the exception, and may fail the read call before
> tried MaxBlockAcquireFailures.
> {noformat}
> some issues of the logic:
> 1. seekToNewSource() logic is broken because it may clear deadNodes in the
> middle.
> 2. the variable "int retries=2" in readWithStrategy seems have conflict with
> MaxBlockAcquireFailures, should it be removed?
> I write a test to produce the scenario, and here is part of the log:
> {noformat}
> 2012-12-05 22:55:15,135 WARN hdfs.DFSClient
> (DFSInputStream.java:readBuffer(596)) - Found Checksum error for
> BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from
> 127.0.0.1:50099 at 0
> 2012-12-05 22:55:15,136 INFO DataNode.clienttrace
> (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest:
> /127.0.0.1:50105, bytes: 4128, op: HDFS_READ, cliID:
> DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID:
> DS-91625336-192.168.0.101-50099-1354719314603, blockid:
> BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002,
> duration: 2925000
> 2012-12-05 22:55:15,136 INFO hdfs.DFSClient
> (DFSInputStream.java:chooseDataNode(741)) - Could not obtain
> BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002 from any
> node: java.io.IOException: No live nodes contain current block. Will get new
> block locations from namenode and retry...
> 2012-12-05 22:55:15,136 WARN hdfs.DFSClient
> (DFSInputStream.java:chooseDataNode(756)) - DFS chooseDataNode: got # 1
> IOException, will wait for 274.34891931868265 msec.
> 2012-12-05 22:55:15,413 INFO DataNode.clienttrace
> (BlockSender.java:sendBlock(672)) - src: /127.0.0.1:50099, dest:
> /127.0.0.1:50106, bytes: 4128, op: HDFS_READ, cliID:
> DFSClient_NONMAPREDUCE_-1488457569_1, offset: 0, srvID:
> DS-91625336-192.168.0.101-50099-1354719314603, blockid:
> BP-50712310-192.168.0.101-1354719313473:blk_-705068286766485620_1002,
> duration: 283000
> 2012-12-05 22:55:15,414 INFO hdfs.StateChange
> (FSNamesystem.java:reportBadBlocks(4761)) - *DIR* reportBadBlocks
> 2012-12-05 22:55:15,415 INFO BlockStateChange
> (CorruptReplicasMap.java:addToCorruptReplicasMap(66)) - BLOCK
> NameSystem.addToCorruptReplicasMap: blk_-705068286766485620 added as corrupt
> on 127.0.0.1:50099 by null because client machine reported it
> 2012-12-05 22:55:15,416 INFO hdfs.TestClientReportBadBlock
> (TestDFSInputStream.java:testDFSInputStreamReadRetryTime(94)) - catch
> IOExceptionorg.apache.hadoop.fs.ChecksumException: Checksum error: /testFile
> at 0 exp: 809972010 got: -1374622118
> 2012-12-05 22:55:15,431 INFO hdfs.MiniDFSCluster
> (MiniDFSCluster.java:shutdown(1411)) - Shutting down the Mini HDFS Cluster
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira