[
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105100#comment-14105100
]
Yongjun Zhang commented on HDFS-6833:
-------------------------------------
HI [~sinchii],
Good catch of the synchronization issue. I have a few minor comments.
- Did you figure out exactly what contributed to the
ConcurrentModificationException? I saw you changed the incorrect
remove-while-iterate code from previous revision.
- change {{", deleting blocks:" + deletingBlocks;}} to {{", to-be-deleted
blocks: " + deletingBlocks;}}
- similarly, change "deleting" in {{ LOG.info("Block file " +
blockpoolReport[d].getBlockFile() + " is deleting");}} to {{is to be deleted}}
- move {{statsRecord.deletingBlocks++;}} to before
{{deletingBlockIds.add(info.getBlockId());}} and after {{LOG.info(...);}} in
the else block, so to be consistent with the if branch code.
Thanks.
> DirectoryScanner should not register a deleting block with memory of DataNode
> -----------------------------------------------------------------------------
>
> Key: HDFS-6833
> URL: https://issues.apache.org/jira/browse/HDFS-6833
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.5.0
> Reporter: Shinichi Yamashita
> Assignee: Shinichi Yamashita
> Attachments: HDFS-6833-6-2.patch, HDFS-6833-6.patch,
> HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.patch, HDFS-6833.patch,
> HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch
>
>
> When a block is deleted in DataNode, the following messages are usually
> output.
> {code}
> 2014-08-07 17:53:11,606 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> for deletion
> 2014-08-07 17:53:11,617 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> However, DirectoryScanner may be executed when DataNode deletes the block in
> the current implementation. And the following messsages are output.
> {code}
> 2014-08-07 17:53:30,519 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> for deletion
> 2014-08-07 17:53:31,426 INFO
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
> BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata
> files:0, missing block files:0, missing blocks in memory:1, mismatched
> blocks:0
> 2014-08-07 17:53:31,426 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added
> missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
> getNumBytes() = 21230663
> getBytesOnDisk() = 21230663
> getVisibleLength()= 21230663
> getVolume() = /hadoop/data1/dfs/data/current
> getBlockFile() =
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> unlinked =false
> 2014-08-07 17:53:31,531 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> Deleting block information is registered in DataNode's memory.
> And when DataNode sends a block report, NameNode receives wrong block
> information.
> For example, when we execute recommission or change the number of
> replication, NameNode may delete the right block as "ExcessReplicate" by this
> problem.
> And "Under-Replicated Blocks" and "Missing Blocks" occur.
> When DataNode run DirectoryScanner, DataNode should not register a deleting
> block.
--
This message was sent by Atlassian JIRA
(v6.2#6252)