deepthi912 opened a new pull request, #18478:
URL: https://github.com/apache/pinot/pull/18478

   ## Problem
   
   `SegmentPrunerService.removeEmptySegments` calls 
`UpsertUtils.hasNoQueryableDocs` to drop empty segments before query execution. 
Today that method reads the segment's **live** `ThreadSafeMutableRoaringBitmap` 
via `IndexSegment.getQueryableDocIds()` / `getValidDocIds()`.
   
   For upsert tables running with consistency mode `SYNC` or `SNAPSHOT`, the 
live bitmap can diverge from the per-query snapshot maintained by 
`UpsertViewManager` between refreshes. So a "no queryable docs here" reading on 
the live bitmap can cause the pruner to drop a segment that is non-empty in the 
snapshot view the query is about to scan — producing wrong row counts.
   
   ## Change
   
   Surface the snapshot to the pruner without breaking the SPI ⇄ impl layering. 
The dependency direction stays the same (`pinot-segment-local` → 
`pinot-segment-spi`), and no new module dependencies are added.
   
   | File | Module | Change |
   | --- | --- | --- |
   | `IndexSegment.java` | `pinot-segment-spi` | +2 default methods: `@Nullable 
MutableRoaringBitmap getQueryableDocIdsSnapshot()` and `boolean 
isUpsertConsistencyModeEnabled()`. Both default to `null` / `false`, so 
non-upsert segments are unaffected. |
   | `PartitionUpsertMetadataManager.java` | `pinot-segment-local` | +1 method 
on the interface: `@Nullable UpsertViewManager getUpsertViewManager()`. 
`BasePartitionUpsertMetadataManager` already implements it, so no impl changes. 
|
   | `UpsertViewManager.java` | `pinot-segment-local` | +1 public lookup 
`getQueryableDocIdsSnapshot(IndexSegment)`. Lock-free: volatile read of 
`_segmentQueryableDocIdsMap` + map lookup. Writers replace the map reference 
atomically under `_upsertViewLock` (write lock), never mutating the old one. |
   | `ImmutableSegmentImpl.java` / `MutableSegmentImpl.java` | 
`pinot-segment-local` | Override both SPI methods, delegating to the 
partition's `UpsertViewManager`. Existence of the view manager is exactly 
equivalent to consistency mode being on 
(`BasePartitionUpsertMetadataManager.java:159–167`). |
   | `UpsertUtils.java` | `pinot-segment-local` | `hasNoQueryableDocs` now: (1) 
consult the snapshot first; (2) if consistency mode is on but the snapshot is 
not yet populated for this segment — first refresh hasn't run or segment just 
tracked — return `false` conservatively; (3) otherwise fall back to live 
bitmaps as before. |
   
   Behavior on non-consistency-mode tables and on non-upsert tables is 
**identical** to today.
   
   ## Tests
   
   New `UpsertUtilsTest` with 7 cases — passing locally:
   
   ```
   [INFO] Running org.apache.pinot.segment.local.upsert.UpsertUtilsTest
   [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
   ```
   
   | Case | Expected |
   | --- | --- |
   | non-upsert, live queryable bitmap empty | `true` |
   | non-upsert, live queryable bitmap non-empty | `false` |
   | non-upsert, queryable `null`, valid bitmap empty | `true` |
   | non-upsert, both bitmaps `null` | `false` |
   | **consistency mode**, snapshot empty, live bitmap non-empty | `true` 
(snapshot wins) |
   | **consistency mode**, snapshot non-empty, live bitmap empty | `false` 
(snapshot wins) |
   | **consistency mode**, snapshot absent, live bitmap empty | `false` 
(conservative deferral) |
   
   Existing `UpsertViewManagerTest` still passes (no regressions on the touched 
API).
   
   ## Test plan
   
   - [x] `mvn -pl pinot-segment-spi,pinot-segment-local,pinot-core -am compile` 
— clean
   - [x] `mvn -pl pinot-segment-local test -Dtest=UpsertUtilsTest` — 7/7 pass
   - [x] `mvn -pl pinot-segment-local test -Dtest=UpsertViewManagerTest` — no 
regressions
   - [ ] CI green
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to