xiangfu0 opened a new pull request, #16605: URL: https://github.com/apache/pinot/pull/16605
This PR adds a smart distinct count aggregator backed by UltraLogLog (ULL):\n\n- New function: distinctCountSmartULL(expression, 'threshold=...;p=...')\n - Starts with exact set accumulation; promotes to ULL once threshold is exceeded\n - Parameters: \n - threshold: (#) to trigger promotion (default 100_000; <=0 disables promotion)\n - p: ULL parameter p (default CommonConstants.Helix.DEFAULT_ULTRALOGLOG_P)\n\nImplementation details:\n- pinot-core: DistinctCountSmartULLAggregationFunction (set→ULL)\n- pinot-segment-spi: AggregationFunctionType.DISTINCTCOUNTSMARTULL\n- pinot-core: AggregationFunctionFactory wiring\n- Planner/runtime:\n - AggregationPlanNode: dictionary-based eligibility\n - NonScanBasedAggregationOperator: dictionary paths for ULL/RAWULL/SmartULL\n\nTests:\n- Query-level tests added mirroring existing SmartHLL coverage\n- Enum recognition updated\n\nNotes:\n- Keeps parity with SmartHLL semantics; uses hash4j wyhash for ULL\n- Maintains BYTES-serialized merge paths where applicable\n\nAfter this lands, we can consider MV variants and end-to-end serialized ULL exports where useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
