Kaushal Khator created HDFS-17862:
-------------------------------------

             Summary: Race condition between DirectoryScanner and append 
operations causes block corruption on single-replica blocks
                 Key: HDFS-17862
                 URL: https://issues.apache.org/jira/browse/HDFS-17862
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode
    Affects Versions: 3.4.1
            Reporter: Kaushal Khator


A race condition exists between the DirectoryScanner reconciliation thread and 
HDFS append operations that can cause blocks to be incorrectly marked as 
corrupt when they have only a single replica. This issue occurs when the 
DirectoryScanner runs while an append operation is in progress on a FINALIZED 
replica.

 

*Root Cause*

The race condition occurs due to the following sequence:

1. HDFS append operations are performed directly on replicas in the 
{{FINALIZED}} state without transitioning them to a non-finalized state. It 
uses ReplicaInPipeline layered on top of {{FINALIZED}} replicas for appends, 
but the underlying state remains {{{}FINALIZED{}}}.

2. The DirectoryScanner's {{checkAndUpdate}} logic is designed to skip only 
non-finalized replicas during reconciliation.

3. The system does not expose an append-in-progress state that DirectoryScanner 
could use to skip such blocks

4. When DirectoryScanner runs during an active append, it detects a length 
mismatch between in-memory metadata and on-disk block size. This can occur when 
the new .meta file is not fully written when the scanner runs.

5. The scanner incorrectly interprets this transient state as corruption and 
marks the block as corrupt. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to