[ 
https://issues.apache.org/jira/browse/HDFS-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808090#comment-17808090
 ] 

ASF GitHub Bot commented on HDFS-17342:
---------------------------------------

haiyang1987 opened a new pull request, #6464:
URL: https://github.com/apache/hadoop/pull/6464

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17342
   
   When users read an append file, occasional exceptions may occur, such as 
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx.
   
   This can happen if one thread is reading the block while writer thread is 
finalizing it simultaneously.
   
   **Root cause:**
   
   1. The reader thread obtains a RBW replica from VolumeMap, such as: 
blk_xxx_xxx[RBW] and the data file should be in /XXX/rbw/blk_xxx.
   2. Simultaneously, the writer thread will finalize this block, moving it 
from the RBW directory to the FINALIZE directory. the data file is move from 
/XXX/rbw/block_xxx to /XXX/finalize/block_xxx.
   3. The reader thread attempts to open this data input stream but encounters 
a FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file 
/XXX/rbw/blk_xxx_xxx doesn't exist at this moment.
   4. The reader thread will treats this block as corrupt, removes the replica 
from the volume map, and the DataNode reports the deleted block to the NameNode.
   5. The NameNode removes this replica for the block.
   6. If the current file replication is 1, this file will cause a missing 
block issue until this DataNode executes the DirectoryScanner again.
   
   As described above, when the reader thread encountered FileNotFoundException 
is as expected, because the file is moved.
   So we need to add a double check to the invalidateMissingBlock logic to 
verify whether the data file or meta file exists to avoid similar cases.
   




> Fix DataNode may invalidates normal block causing missing block
> ---------------------------------------------------------------
>
>                 Key: HDFS-17342
>                 URL: https://issues.apache.org/jira/browse/HDFS-17342
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> When users read an append file, occasional exceptions may occur, such as 
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: xxx.
> This can happen if one thread is reading the block while writer thread is 
> finalizing it simultaneously.
> *Root cause:*
> # The reader thread obtains a RBW replica from VolumeMap, such as: 
> blk_xxx_xxx[RBW] and  the data file should be in /XXX/rbw/blk_xxx.
> # Simultaneously, the writer thread will finalize this block, moving it from 
> the RBW directory to the FINALIZE directory. the data file is move from 
> /XXX/rbw/block_xxx to /XXX/finalize/block_xxx.
> # The reader thread attempts to open this data input stream but encounters a 
> FileNotFoundException because the data file /XXX/rbw/blk_xxx or meta file 
> /XXX/rbw/blk_xxx_xxx doesn't exist at this moment.
> # The reader thread  will treats this block as corrupt, removes the replica 
> from the volume map, and the DataNode reports the deleted block to the 
> NameNode.
> # The NameNode removes this replica for the block.
> # If the current file replication is 1, this file will cause a missing block 
> issue until this DataNode executes the DirectoryScanner again.
> As described above, when the reader thread encountered FileNotFoundException 
> is as expected, because the file is moved.
> So we need to add a double check to the invalidateMissingBlock logic to 
> verify whether the data file or meta file exists to avoid similar cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to