zihanx opened a new pull request, #15803:
URL: https://github.com/apache/lucene/pull/15803
…leaf reader
### Summary
This PR adds a new utility method `ReaderUtil.partitionByLeaf(int[]
sortedDocIds, List<LeafReaderContext> leaves)` that efficiently partitions a
sorted array of global doc IDs across the leaves of an `IndexReader`.
### Motivation
A common pattern when working with top-N hits is to map global doc IDs back
to their per-segment leaves. The existing `ReaderUtil.subIndex` does a binary
search per doc ID (`O(n * log(n))`), but when you need to partition *all* hits
at once, a merge-sort-style linear scan is more efficient:
1. Sort the input doc IDs: `O(n * log(n))` with small constant (primitive
int sort, potentially SIMD-accelerated).
2. Linear pass intersecting sorted doc IDs with leaf boundaries: `O(n)`.
### Changes
- Added `ReaderUtil.partitionByLeaf(int[] sortedDocIds,
List<LeafReaderContext> leaves)`, two-pass linear algorithm:
- **Pass 1 (count + allocate)**: walk doc IDs and leaf boundaries
together, count hits per leaf, allocate result arrays
- **Pass 2 (copy)**: `System.arraycopy` each leaf's contiguous slice from
the sorted input
- Added `TestReaderUtil` with coverage for empty input, single segment,
multiple segments, and segments with no hits
### Example Use Case
```java
int[] sortedTopDocIds = ...; // sorted global doc IDs from top-N collection
List<LeafReaderContext> leaves = reader.leaves();
int[][] perLeafDocIds = ReaderUtil.partitionByLeaf(sortedTopDocIds, leaves);
for (int i = 0; i < leaves.size(); i++) {
if (perLeafDocIds[i].length == 0) {
continue; // skip segments with no hits
}
// process hits for this leaf
processLeaf(leaves.get(i), perLeafDocIds[i]);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]