voonhous opened a new pull request, #18736:
URL: https://github.com/apache/hudi/pull/18736

   ### Describe the issue this Pull Request addresses
   
   The existing tests for `BatchedBlobReader` (introduced in #18098) only check 
**byte-level correctness** of the returned data. They do not exercise the 
merging/batching logic itself, so a regression that silently disables the I/O 
reduction (for example, a bad gap-threshold comparison or a broken range merge) 
would still pass. This PR adds coverage that fails when the batching no longer 
batches.
   
   ### Summary and Changelog
   
   Adds two focused test classes for `BatchedBlobReader`:
   
   - **`TestBatchedBlobReaderMerge`**: unit tests against `mergeRanges` and 
`identifyConsecutiveRanges` (made package-private). Asserts merged-range 
counts, gap-threshold inclusive/exclusive boundaries, multi-file grouping, 
sort, index preservation, and rejection of overlapping ranges. No Spark, no I/O.
   - **`TestBatchedBlobReaderIO`**: integration tests that drive 
`processPartition` against a `CountingHoodieStorage` wrapper around a real 
storage and assert the *number* of `openSeekable` / `seek` calls. Scenarios:
     1. Many blobs in one file.
     2. Contiguous zero-gap blobs.
     3. Threshold-controlled small/large gaps (including the inclusive 
boundary).
     4. Multi-file queries (per-file batching, mixed gap patterns, interleaved 
input order).
   - Bumps the visibility of two `BatchedBlobReader` helpers from `private` to 
package-private so the merge tests can call them directly.
   
   ### Impact
   
   *Tests only.* No production code logic changes. The visibility relaxation on 
two helpers is scoped to the same package as the reader.
   
   ### Risk Level
   
   **none**
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [X] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [X] Enough context is provided in the sections above
   - [X] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to