[ 
https://issues.apache.org/jira/browse/HDFS-17863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053767#comment-18053767
 ] 

ASF GitHub Bot commented on HDFS-17863:
---------------------------------------

hadoop-yetus commented on PR #8203:
URL: https://github.com/apache/hadoop/pull/8203#issuecomment-3788103798

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  25m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  spotbugs  |   2m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  spotbugs  |   1m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  6s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | -1 :x: |  unit  | 174m 44s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8203/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 248m 31s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.52 ServerAPI=1.52 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8203/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/8203 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux b04473ab70b6 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 
20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 405db4e6657c234a96852423d7a38ad4203d3862 |
   | Default Java | Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8203/1/testReport/ |
   | Max. process+thread count | 4830 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8203/1/console |
   | versions | git=2.25.1 maven=3.9.11 spotbugs=4.9.7 |
   | Powered by | Apache Yetus 0.14.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> CannotObtainBlockLengthException after DataNode restart
> -------------------------------------------------------
>
>                 Key: HDFS-17863
>                 URL: https://issues.apache.org/jira/browse/HDFS-17863
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, dfs, hdfs-client
>    Affects Versions: 3.3.5
>         Environment: Hadoop 3.3.5
> Java 8
> Maven 3.6.3
>            Reporter: rstest
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: reproduce.sh, restart.patch
>
>
> h2. DESCRIPTION:
> After hflush(), HDFS guarantees that written data becomes visible to readers,
> even while the file remains under construction. This guarantee is BROKEN after
> DataNode restart. Under-construction blocks that have been flushed become
> inaccessible (visible length = -1) until explicit lease recovery, causing
> CannotObtainBlockLengthException when clients try to read the file.
> This is a genuine production bug that affects:
>  - HBase WAL recovery after DataNode failures
>  - Streaming applications that write and read simultaneously
>  - Any application relying on hflush() visibility guarantees
> h2. STEPS TO REPRODUCE:
> Download the `reproduce.sh` and `restart.patch`, then
> {code:java}
> $ bash reproduce.sh{code}
> The script:
> 1. Clones Hadoop repository (release 3.3.5 branch)
> 2. Applies test patch (restart.patch) that adds the reproduction test
> 3. Builds the Hadoop HDFS module
> 4. Runs test case: 
> TestBlockToken#testLastLocatedBlockTokenExpiryWithDataNodeRestart
> *EXPECTED RESULT:*
> File should be readable after hflush(), even after DataNode restart
> *ACTUAL RESULT:*
> org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block
> length for LocatedBlock
> The bug is confirmed if the test fails with CannotObtainBlockLengthException.
> *KEY OBSERVATION:*
>  - Tests with NameNode-only restart (no DataNode restart) DO NOT fail
>  - The bug ONLY occurs when DataNode restarts
> h2. ROOT CAUSE:
> When a DataNode restarts, under-construction block replicas are loaded from
> disk and placed in ReplicaWaitingToBeRecovered (RWR) state:
> File: ReplicaWaitingToBeRecovered.java:75
> @Override
> public long getVisibleLength()
> {   return -1;  // no bytes are visible }
> This state explicitly returns -1 for visible length, meaning "no bytes 
> visible"
> until lease recovery completes.
> When a client tries to open the file:
> 1. DFSInputStream calls readBlockLength() to determine UC block length
> 2. Contacts DataNode via getReplicaVisibleLength()
> 3. Receives -1 (not a valid length)
> 4. Treats this as a failure, tries next DataNode
> 5. All DataNodes return -1
> 6. Throws CannotObtainBlockLengthException
> The problem persists because:
>  - Lease is still held by the original client (output stream still open)
>  - Client is still alive (from HDFS's perspective)
>  - Automatic lease recovery only triggers when lease holder is detected as 
> dead
>  - No mechanism to automatically recover in this scenario
> h2. *DIAGNOSTIC EVIDENCE:*
> BEFORE DataNode Restart:
>  - File under construction: true
>  - Block length: 6 bytes
>  - Block is complete: false
>  - DataNode replica visible length: 6  ✅ READABLE
> AFTER DataNode Restart:
>  - File under construction: true
>  - Block length: 6 bytes
>  - Block is complete: false
>  - DataNode replica visible length: -1  ❌ UNREADABLE!
> AFTER Explicit Lease Recovery:
>  - File under construction: false
>  - Block length: 6 bytes
>  - Block is complete: true
>  - DataNode replica visible length: 6  ✅ READABLE AGAIN!
> h2. WHY THIS IS A BUG (NOT EXPECTED BEHAVIOR):
> HDFS Guarantees:
>  - hflush() ensures data is visible to new readers
>  - Under-construction files should be readable after hflush()
>  - This is the documented contract for hflush()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to