iemejia opened a new pull request, #8511:
URL: https://github.com/apache/hadoop/pull/8511
## Description
`ChecksumFileSystem.readVectored()` allocates buffers for both data ranges
and checksum ranges through the caller's allocator (`IntFunction<ByteBuffer>`).
After checksum verification completes, the checksum buffers were never released
because:
1. The caller has no reference to checksum buffers (only data range results
are visible)
2. The system did not call the `release` consumer on them
This causes buffer leaks when callers use a tracking or pooled allocator.
This bug was discovered while working on Apache Parquet's
`TrackingByteBufferAllocator`, which detected unreleased buffers accumulating
during vectored reads through `ChecksumFileSystem`.
## Root Cause
In `ChecksumFileSystem.ChecksumFSInputChecker.readVectored()`, checksum
ranges are read via `sums.readVectored(checksumRanges, allocate, release)`. The
checksum buffers are used in `thenCombineAsync` calls for verification, but
after verification completes, no code released them back to the caller's pool.
A single checksum range can cover multiple data ranges, so the checksum
buffer is shared across multiple `thenCombineAsync` calls — it must only be
released once, after ALL verifications using it are complete.
## Fix
After collecting all verification futures for a checksum range, use
`CompletableFuture.allOf(verifications).thenRun(...)` to release the checksum
buffer exactly once after all verifications complete:
```java
CompletableFuture.allOf(verifications.toArray(new CompletableFuture[0]))
.thenRun(() -> release.accept(checksumRange.getData().join()));
```
This ensures:
- Exactly-once release (not once per data range that shares the checksum
buffer)
- Release happens only after the buffer is no longer in use
- Works correctly with both the 2-arg API (no-op release) and the 3-arg API
## Testing
Added `testChecksumBuffersReleasedAfterVectoredRead()` to
`TestLocalFSContractVectoredRead`:
- Uses a counting allocator/release pair to track buffer lifecycle
- Performs a vectored read through `LocalFileSystem` (which uses
`ChecksumFileSystem`)
- Asserts that the system calls the release consumer at least once for
internal buffers
- Confirmed the test **fails without the fix** (0 system-initiated releases)
and **passes with it**
All 54 existing vectored read contract tests continue to pass.
## JIRA
https://issues.apache.org/jira/browse/HADOOP-19901
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]