xiangfu0 opened a new pull request, #17338: URL: https://github.com/apache/pinot/pull/17338
This pull request enhances the aggregation optimizer in Pinot to support null-aware query rewrites when null handling is enabled. The changes ensure that aggregation expressions, especially for SUM, are rewritten to use `count(column)` instead of `count(1)` when null handling is turned on, improving correctness for queries involving nullable columns. The update also includes a new unit test to verify this behavior. **Aggregation optimizer improvements:** * Updated the `AggregationOptimizer` to detect if null handling is enabled via query options and adjust aggregation rewrites accordingly, particularly for SUM, by using `count(column)` instead of `count(1)` in rewrite expressions. [[1]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cR46-R54) [[2]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL238-R257) [[3]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL327-R346) [[4]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cR360-R366) * Refactored optimizer methods to propagate the null handling flag through all relevant internal methods, ensuring consistent behavior throughout the optimizer logic. [[1]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL61-R67) [[2]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL84-R108) [[3]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL109-R119) [[4]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL121-R131) [[5]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL133-R150) [[6]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL162-R177) [[7]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL179-R189) [[8]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL190-R225) [[9]](diffhunk://#diff-efb2c0467e3977e0d0e329ab 039ea1009a30154e539a819774ecb7588dfcea2cL224-R237) [[10]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL256-R276) [[11]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cL277-R297) **SQL parser and query compilation:** * Modified the SQL parser (`CalciteSqlParser`) to pass query options directly to the PinotQuery compilation process, enabling downstream components to access query options such as null handling. [[1]](diffhunk://#diff-5bdf53db08d67e2823f300f3db3af05dc704f4ef5319299cc3233f9b55aba059L175-R176) [[2]](diffhunk://#diff-5bdf53db08d67e2823f300f3db3af05dc704f4ef5319299cc3233f9b55aba059R426-R431) **Testing enhancements:** * Added a new unit test in `AggregationOptimizerTest` to verify that the optimizer correctly rewrites SUM expressions with null handling enabled by checking for the use of `count(column)`. **Dependency and import updates:** * Added necessary imports for handling query options and null handling checks in both production and test code. [[1]](diffhunk://#diff-efb2c0467e3977e0d0e329ab039ea1009a30154e539a819774ecb7588dfcea2cR22-R28) [[2]](diffhunk://#diff-f8777e0863d50c6c6f7a18949bf6680e6f05dad5371f4e4f2e67f5681133a5d5R27-R36) **Minor code maintenance:** * Added missing import for `Collections` in the SQL parser to support new method signatures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
