[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ikweesung updated HDFS-5809:
----------------------------

    Description: 
Hello, everyone.

When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in 
my cluster.
Then, randomly one datanode drop into infinite loop as the log show, and 
finally all datanodes drop into infinite loop.
Every datanode just verify fail by one block. 
When i check the fail block like this : hadoop fsck / -files -blocks | grep 
blk_1223474551535936089_4702249, no hdfs file contains the block.

It seems that in while block of BlockPoolSliceScanner's scan method drop into 
infinite loop .
BlockPoolSliceScanner: 650

while (datanode.shouldRun
&& !datanode.blockScanner.blockScannerThread.isInterrupted()
&& datanode.isBPServiceAlive(blockPoolId)) { ....

The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).

Please excuse my poor English.
-------------------------------------------------------------------------------------------------------------------------------------------------
LOG: 
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write

  was:
Hello, everyone.

When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in 
my cluster.
Then, randomly one datanode drop into infinite loop as the log show, and 
finally all datanodes drop into infinite loop.
Every datanode just verify fail by one block. 
When i check the fail block like this : hadoop fsck / -files -blocks | grep 
blk_1223474551535936089_4702249, no hdfs file contains the block.

It seems that in while block of BlockPoolSliceScanner's scan method drop into 
infinite loop .
BlockPoolSliceScanner: 650

while (datanode.shouldRun
&& !datanode.blockScanner.blockScannerThread.isInterrupted()
&& datanode.isBPServiceAlive(blockPoolId)) { ....

The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).

Please excuse my poor English.


> BlockPoolSliceScanner make datanode to drop into infinite loop
> --------------------------------------------------------------
>
>                 Key: HDFS-5809
>                 URL: https://issues.apache.org/jira/browse/HDFS-5809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>         Environment: jdk1.6, centos6.4
>            Reporter: ikweesung
>            Priority: Critical
>              Labels: blockpoolslicescanner, datanode, infinite-loop
>
> Hello, everyone.
> When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks 
> in my cluster.
> Then, randomly one datanode drop into infinite loop as the log show, and 
> finally all datanodes drop into infinite loop.
> Every datanode just verify fail by one block. 
> When i check the fail block like this : hadoop fsck / -files -blocks | grep 
> blk_1223474551535936089_4702249, no hdfs file contains the block.
> It seems that in while block of BlockPoolSliceScanner's scan method drop into 
> infinite loop .
> BlockPoolSliceScanner: 650
> while (datanode.shouldRun
> && !datanode.blockScanner.blockScannerThread.isInterrupted()
> && datanode.isBPServiceAlive(blockPoolId)) { ....
> The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).
> Please excuse my poor English.
> -------------------------------------------------------------------------------------------------------------------------------------------------
> LOG: 
> 2014-01-21 18:36:50,582 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> failed for 
> BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
> may be due to race with write
> 2014-01-21 18:36:50,582 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> failed for 
> BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
> may be due to race with write
> 2014-01-21 18:36:50,582 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
> failed for 
> BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - 
> may be due to race with write



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to