[
https://issues.apache.org/jira/browse/HDFS-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shuyan Zhang resolved HDFS-17342.
---------------------------------
Hadoop Flags: Reviewed
Target Version/s: 3.5.0
Resolution: Fixed
> Fix DataNode may invalidates normal block causing missing block
> ---------------------------------------------------------------
>
> Key: HDFS-17342
> URL: https://issues.apache.org/jira/browse/HDFS-17342
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Haiyang Hu
> Assignee: Haiyang Hu
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When users read an append file, occasional exceptions may occur, such as
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx.
> This can happen if one thread is reading the block while writer thread is
> finalizing it simultaneously.
> *Root cause:*
> # The reader thread obtains a RBW replica from VolumeMap, such as:
> blk_xxx_xxx[RBW] and the data file should be in /XXX/rbw/blk_xxx.
> # Simultaneously, the writer thread will finalize this block, moving it from
> the RBW directory to the FINALIZE directory. the data file is move from
> /XXX/rbw/block_xxx to /XXX/finalize/block_xxx.
> # The reader thread attempts to open this data input stream but encounters a
> FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file
> /XXX/rbw/blk_xxx_xxx doesn't exist at this moment.
> # The reader thread will treats this block as corrupt, removes the replica
> from the volume map, and the DataNode reports the deleted block to the
> NameNode.
> # The NameNode removes this replica for the block.
> # If the current file replication is 1, this file will cause a missing block
> issue until this DataNode executes the DirectoryScanner again.
> As described above, when the reader thread encountered FileNotFoundException
> is as expected, because the file is moved.
> So we need to add a double check to the invalidateMissingBlock logic to
> verify whether the data file or meta file exists to avoid similar cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]