tarun11Mavani opened a new pull request, #18760:
URL: https://github.com/apache/pinot/pull/18760

   ## Summary
   
   Extends `FUNNEL_COUNT` to accept multiple columns in `CORRELATE_BY(col1, 
col2, ...)`,
   enabling funnel analysis that tracks users through steps within a composite 
key
   (e.g., per user per device category), not just a single dimension.
   
   ### Design
   
   The single-key aggregation path is preserved as a zero-overhead fast path — 
structurally
   identical to the original single-column implementation — so existing queries 
see no
   regression. Multi-key support is added as a separate code path selected once 
per block.
   
   - **`AggregationStrategy`**: Split into two abstract methods (`addSingleKey` 
/ `addMultiKey`)
     with separate aggregation loops for single-key and multi-key, eliminating 
per-row branching
     on the dominant single-key path.
   - **`DictIdsWrapper`**: Added composite-key mapping for multi-column 
CORRELATE_BY. Uses
     stride-based arithmetic when the product of dictionary sizes fits in 
`int`, falling back
     to a `HashMap<IntArrayList, Integer>` for large key spaces. Also adds 
`toCompositeString`
     for length-prefix encoded composite string keys used during result 
extraction.
   - **`SortedAggregationResult`**: Updated to handle multi-key by tracking 
secondary keys via
     a `HashMap` within each primary-key group (data is sorted on the primary 
column only).
   - **`BitmapAggregationStrategy`**, **`SortedAggregationStrategy`**,
     **`ThetaSketchAggregationStrategy`**: Implement both `addSingleKey` and 
`addMultiKey`.
   - **`SetResultExtractionStrategy`**, **`BitmapResultExtractionStrategy`**: 
Updated to
     reverse-map composite IDs back to per-column dictionary values during 
result extraction.
   - **`FunnelCountSortedAggregationFunction`**: Propagates multi-dictionary 
context through
     the sorted aggregation result extraction pipeline.
   
   ### Example Query
   
   ```sql
   SELECT FUNNEL_COUNT(
     STEPS(step1_col, step2_col, step3_col),
     CORRELATE_BY(user_id, device_category),
     SETTINGS('theta_sketch')
   ) FROM myTable
   ```
   
   ### Test Plan
   - Existing single-key funnel integration tests pass unchanged
   - New multi-key integration tests: testMultiKeyOverall, testMultiKeyGroupBy, 
testMultiKeyWithFilter, testMultiKeyEmptyResult
   - All strategies tested: BITMAP, SORTED, THETA_SKETCH, SET
   - JMH benchmarks verify zero regression on single-key path
   - Multi-key path benchmarked for throughput baseline


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to