[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner make datanode to drop into infinite loop

ikweesung (JIRA) Tue, 21 Jan 2014 23:56:26 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878380#comment-13878380
 ]


ikweesung commented on HDFS-5809:
---------------------------------

In BlockPoolSliceScanner, the code below

if (((now - getEarliestScanTime()) >= scanPeriod) || ((!blockInfoSet.isEmpty()) 
&& !(this.isFirstBlockProcessed()))) { verifyFirstBlock(); } else { ....}

may cause this problem in my opinion.
After three weeks from last block pool scanning, the condition : (now - 
getEarliestScanTime()) >= scanPeriod) will be true, and at this time, race 
between scanning and hdfs append may make the datanode drop into the infinite 
loop. Because code which update the EarliestScanTime place after the code which 
throw the FNFE.

When i changed scanPeriod to 1 hour, the datanode drop into infinite loop in 
several minutes. 
Then i changed scanPeriod to default : 504 hours, the datanode did not drop 
into infinite loop in a long time.

> BlockPoolSliceScanner make datanode to drop into infinite loop
> --------------------------------------------------------------
>
>                 Key: HDFS-5809
>                 URL: https://issues.apache.org/jira/browse/HDFS-5809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>         Environment: jdk1.6, centos6.4
>            Reporter: ikweesung
>            Priority: Critical
>              Labels: blockpoolslicescanner, datanode, infinite-loop
>
> Hello, everyone.
> When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks 
> in my cluster.
> Then, randomly one datanode drop into infinite loop as the log show, and 
> finally all datanodes drop into infinite loop.
> Every datanode just verify fail by one block. 
> When i check the fail block like this : hadoop fsck / -files -blocks | grep 
> blk_1223474551535936089_4702249, no hdfs file contains the block.
> It seems that in while block of BlockPoolSliceScanner's scan method drop into 
> infinite loop .
> BlockPoolSliceScanner: 650
> while (datanode.shouldRun
> && !datanode.blockScanner.blockScannerThread.isInterrupted()
> && datanode.isBPServiceAlive(blockPoolId)) { ....
> The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
> Please excuse my poor English.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner make datanode to drop into infinite loop

Reply via email to