teamconfx opened a new pull request, #8203:
URL: https://github.com/apache/hadoop/pull/8203
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
This PR fixes [HDFS-17863](https://issues.apache.org/jira/browse/HDFS-17863).
The bug accurs where under-construction files become unreadable after
DataNode restart, even though the data was successfully flushed with hflush().
This breaks HDFS's visibility guarantee for flushed data.
When a DataNode restarts, under-construction block replicas in the "rbw"
(replica being written) directory are loaded as ReplicaWaitingToBeRecovered
(RWR state). The getVisibleLength() method in this class unconditionally
returned -1:
```java
// Before (ReplicaWaitingToBeRecovered.java:75-77)
@Override
public long getVisibleLength() {
return -1; //no bytes are visible
}
```
When a client tries to read the file:
1. DFSInputStream calls readBlockLength() to determine the
under-construction block length
2. It contacts the DataNode via getReplicaVisibleLength()
3. The DataNode returns -1 (from RWR replica)
4. Client treats this as invalid and throws
CannotObtainBlockLengthException
This violates HDFS's hflush() contract which guarantees that flushed data
remains visible to readers.
### Changes
Changed ReplicaWaitingToBeRecovered.getVisibleLength() to return
getNumBytes() instead of -1:
```java
// After (ReplicaWaitingToBeRecovered.java:75-77)
@Override
public long getVisibleLength() {
return getNumBytes(); // all bytes are visible since validated on load
}
```
### Why This Fix Is Safe
The fix is safe because the block length returned by getNumBytes() has
already been validated against checksums when the replica is loaded from disk.
In BlockPoolSlice.addReplicaToReplicasMap() (lines 693-700), RWR replicas
are created with a validated length:
```java
if (loadRwr) {
ReplicaBuilder builder = new ReplicaBuilder(ReplicaState.RWR)
.setBlockId(blockId)
.setLength(validateIntegrityAndSetLength(file, genStamp)) // <--
Validated!
.setGenerationStamp(genStamp)
...
}
```
The validateIntegrityAndSetLength() method (lines 871-920):
1. Reads the checksum from the meta file
2. Validates the last chunk of data against its checksum
3. Returns only the length of data that passes checksum validation
4. Truncates any corrupted trailing data
Therefore, getNumBytes() returns a checksum-verified length that is safe
to expose to readers. This is the same validation used for RBW replicas loaded
with valid restart metadata.
A New test added:
- TestPersistBlocks#testReadUnderConstructionFileAfterDataNodeRestart -
Specifically reproduces the JIRA scenario
I also ran other tests related to this change:
Regression tests - all pass (58 tests):
- TestDatanodeRestart (2 tests)
- TestLeaseRecovery2 (8 tests)
- TestFileLengthOnClusterRestart (1 test)
- TestReadWhileWriting (1 test)
- TestTransferRbw (1 test)
- TestWriteRead (3 tests)
- TestWriteToReplica (6 tests)
- TestBlockRecovery (17 tests)
- TestBlockRecovery2 (5 tests)
- TestBlockListAsLongs (7 tests)
- TestPersistBlocks (7 tests)
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
If an AI tool was used:
- [ ] The PR includes the phrase "Contains content generated by <tool>"
where <tool> is the name of the AI tool used.
- [ ] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]