[
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241705#comment-14241705
]
Yongjun Zhang commented on HDFS-6833:
-------------------------------------
Hi [~sinchii],
Thanks for your hard work and patience. I reviewed the last rev, have a few
more comments:
* remove unused import of HashSet and Set in DirectoryScanner.java
* about code at line 291-304 in FsDatasetAsyncDiskService.java
** the logic at line 291 in FsDatasetAsyncDiskService.java is inverted
** suggest to replace line 291-295 with
{code}
Set<Long> blockIds = deletedBlockIds.get(block.getBlockPoolId());
if (blockIds == null) {
blockIds = new HashSet<Long>();
deletedBlockIds.put(block.getBlockPoolId(), blockIds);
}
blockIds.add(block.getBlockId());
{code}
** need to create a synchronized method {{synchronized void
updateDeletedBlockId}} in FsDatasetAsyncDiskService, and move this whole block
of code (line 291-304) to this method
* The functionality of {{FsDatasetImpl#removeDeletedBlocks}} need to be moved
to
ReplicaMap, by adding a similar method in ReplicaMap, access to this method
should be protected by the mutex inside ReplicaMap. And then call this new
method within {{FsDatasetImpl#removeDeletedBlocks}} as a delegation.
* About your concern of not having {{private boolean scanning}}, I think the
relevant code in prior patch (that only removes deleted blocks from
deletingBlock when this variable is true) is not correct, because the scan can
start any time, and you would likely end up not removing some deleted blocks
from deletingBlock. About locking, we need to protect access to
{{deletingBlock}}, and {{deletedBlockIds}}, and make them in sync. Would you
please elaborate if you have more concerns?
Thanks.
> DirectoryScanner should not register a deleting block with memory of DataNode
> -----------------------------------------------------------------------------
>
> Key: HDFS-6833
> URL: https://issues.apache.org/jira/browse/HDFS-6833
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.5.0, 2.5.1
> Reporter: Shinichi Yamashita
> Assignee: Shinichi Yamashita
> Priority: Critical
> Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch,
> HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, HDFS-6833-6.patch,
> HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch,
> HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch,
> HDFS-6833.patch
>
>
> When a block is deleted in DataNode, the following messages are usually
> output.
> {code}
> 2014-08-07 17:53:11,606 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> for deletion
> 2014-08-07 17:53:11,617 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> However, DirectoryScanner may be executed when DataNode deletes the block in
> the current implementation. And the following messsages are output.
> {code}
> 2014-08-07 17:53:30,519 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Scheduling blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> for deletion
> 2014-08-07 17:53:31,426 INFO
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
> BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata
> files:0, missing block files:0, missing blocks in memory:1, mismatched
> blocks:0
> 2014-08-07 17:53:31,426 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added
> missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
> getNumBytes() = 21230663
> getBytesOnDisk() = 21230663
> getVisibleLength()= 21230663
> getVolume() = /hadoop/data1/dfs/data/current
> getBlockFile() =
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> unlinked =false
> 2014-08-07 17:53:31,531 INFO
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
> Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> Deleting block information is registered in DataNode's memory.
> And when DataNode sends a block report, NameNode receives wrong block
> information.
> For example, when we execute recommission or change the number of
> replication, NameNode may delete the right block as "ExcessReplicate" by this
> problem.
> And "Under-Replicated Blocks" and "Missing Blocks" occur.
> When DataNode run DirectoryScanner, DataNode should not register a deleting
> block.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)