voonhous opened a new pull request, #18736:
URL: https://github.com/apache/hudi/pull/18736
### Describe the issue this Pull Request addresses
The existing tests for `BatchedBlobReader` (introduced in #18098) only check
**byte-level correctness** of the returned data. They do not exercise the
merging/batching logic itself, so a regression that silently disables the I/O
reduction (for example, a bad gap-threshold comparison or a broken range merge)
would still pass. This PR adds coverage that fails when the batching no longer
batches.
### Summary and Changelog
Adds two focused test classes for `BatchedBlobReader`:
- **`TestBatchedBlobReaderMerge`**: unit tests against `mergeRanges` and
`identifyConsecutiveRanges` (made package-private). Asserts merged-range
counts, gap-threshold inclusive/exclusive boundaries, multi-file grouping,
sort, index preservation, and rejection of overlapping ranges. No Spark, no I/O.
- **`TestBatchedBlobReaderIO`**: integration tests that drive
`processPartition` against a `CountingHoodieStorage` wrapper around a real
storage and assert the *number* of `openSeekable` / `seek` calls. Scenarios:
1. Many blobs in one file.
2. Contiguous zero-gap blobs.
3. Threshold-controlled small/large gaps (including the inclusive
boundary).
4. Multi-file queries (per-file batching, mixed gap patterns, interleaved
input order).
- Bumps the visibility of two `BatchedBlobReader` helpers from `private` to
package-private so the merge tests can call them directly.
### Impact
*Tests only.* No production code logic changes. The visibility relaxation on
two helpers is scoped to the same package as the reader.
### Risk Level
**none**
### Documentation Update
none
### Contributor's checklist
- [X] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [X] Enough context is provided in the sections above
- [X] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]