xiangfu0 opened a new pull request, #17606:
URL: https://github.com/apache/pinot/pull/17606

   ### Motivation
   - Provide a config toggle to enable or disable dimension-table upsert/dedup 
logic so clusters can opt into queryable-doc-id filtering and upsert behavior 
for dimension tables. 
   - Ensure upsert-related processing (computing/applying per-segment queryable 
doc id bitmaps and enabling segment upsert state) is only performed when the 
feature is explicitly enabled.
   
   ### Description
   - Added an `enableUpsert` boolean to `DimensionTableConfig` (JSON property 
`enableUpsert`) and exposed `isUpsertEnabled()` in `pinot-spi`.
   - Read the new flag in `DimensionTableDataManager` and gate upsert-related 
logic behind `_enableUpsert`, including using queryable-doc-id snapshots when 
sizing/iterating segments and applying per-segment bitmaps.
   - Introduced a small `RecordLocation` type and helper methods 
`applyQueryableDocIdsForRecordLocations`, `applyQueryableDocIdsForLookupTable`, 
`applyQueryableDocIdsToSegments`, and `getQueryableDocIdsSnapshot` in 
`DimensionTableDataManager` to compute and apply per-segment 
`MutableRoaringBitmap` sets and call `ImmutableSegmentImpl.enableUpsert(...)` 
when appropriate.
   - Updated all test and helper call sites that construct 
`DimensionTableConfig` to pass the new flag, and added integration coverage 
that creates a small OFFLINE upsert dimension table and asserts deduplicated 
selection/count results (`testDimensionTableUpsertSelection`), as well as a 
unit test `testLookupRespectsQueryableDocIds` that verifies lookup respects 
queryable doc ids when upsert is enabled.
   
   ### Testing
   - No automated test suites (`mvn`/CI) were executed as part of this change.
   - Added/updated tests include 
`MultiStageEngineIntegrationTest.testDimensionTableUpsertSelection` 
(integration) and 
`DimensionTableDataManagerTest.testLookupRespectsQueryableDocIds` (unit), but 
these tests were added and not run in this rollout.
   - Existing test usages and benchmark helpers were updated to construct the 
new config parameter where needed and compile-time imports were adjusted 
accordingly.
   
   ------
   [Codex 
Task](https://chatgpt.com/codex/tasks/task_e_697af028b33c832d98fb7d8ff1035e4a)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to