YutaLin opened a new pull request, #3757: URL: https://github.com/apache/datafusion-comet/pull/3757
## Which issue does this PR close? Comet does not currently support the Spark percentile_cont function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion. PercentileCont calculates a percentile value based on a continuous distribution of numeric or ANSI interval columns at a given percentage. It implements the SQL PERCENTILE_CONT function which uses linear interpolation between values when the exact percentile position falls between two data points. This expression is a runtime-replaceable aggregate that delegates to the internal Percentile implementation. Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration. Not include array percentile and weighted percentile now. Closes #3190 ## What changes are included in this PR? Add PercentileCont message for expr.proto Add `CometPercentile` with validations Register Percentile class in QueryPlanSerde Handle PercentileCont protobuf in `planner.rs` Custom `percentile.rs` with Binary state because Datafusion percentile_cont stores all values as List<Float64> and shuffle with that will cause `Cannot cast list to non-list data types` error ## How are these changes tested? Add sql test include numerical and interval -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
