[
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878380#comment-13878380
]
ikweesung commented on HDFS-5809:
---------------------------------
In BlockPoolSliceScanner, the code below
if (((now - getEarliestScanTime()) >= scanPeriod) || ((!blockInfoSet.isEmpty())
&& !(this.isFirstBlockProcessed()))) { verifyFirstBlock(); } else { ....}
may cause this problem in my opinion.
After three weeks from last block pool scanning, the condition : (now -
getEarliestScanTime()) >= scanPeriod) will be true, and at this time, race
between scanning and hdfs append may make the datanode drop into the infinite
loop. Because code which update the EarliestScanTime place after the code which
throw the FNFE.
When i changed scanPeriod to 1 hour, the datanode drop into infinite loop in
several minutes.
Then i changed scanPeriod to default : 504 hours, the datanode did not drop
into infinite loop in a long time.
> BlockPoolSliceScanner make datanode to drop into infinite loop
> --------------------------------------------------------------
>
> Key: HDFS-5809
> URL: https://issues.apache.org/jira/browse/HDFS-5809
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.0.0-alpha
> Environment: jdk1.6, centos6.4
> Reporter: ikweesung
> Priority: Critical
> Labels: blockpoolslicescanner, datanode, infinite-loop
>
> Hello, everyone.
> When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks
> in my cluster.
> Then, randomly one datanode drop into infinite loop as the log show, and
> finally all datanodes drop into infinite loop.
> Every datanode just verify fail by one block.
> When i check the fail block like this : hadoop fsck / -files -blocks | grep
> blk_1223474551535936089_4702249, no hdfs file contains the block.
> It seems that in while block of BlockPoolSliceScanner's scan method drop into
> infinite loop .
> BlockPoolSliceScanner: 650
> while (datanode.shouldRun
> && !datanode.blockScanner.blockScannerThread.isInterrupted()
> && datanode.isBPServiceAlive(blockPoolId)) { ....
> The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
> Please excuse my poor English.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)