[PR] feat: Support Spark expression: percentile_cont [datafusion-comet]

via GitHub Sat, 21 Mar 2026 13:50:24 -0700


YutaLin opened a new pull request, #3757:
URL: https://github.com/apache/datafusion-comet/pull/3757


   ## Which issue does this PR close?
   
   Comet does not currently support the Spark percentile_cont function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   PercentileCont calculates a percentile value based on a continuous 
distribution of numeric or ANSI interval columns at a given percentage. It 
implements the SQL PERCENTILE_CONT function which uses linear interpolation 
between values when the exact percentile position falls between two data 
points. This expression is a runtime-replaceable aggregate that delegates to 
the internal Percentile implementation.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration. 
   
   Not include array percentile and weighted percentile now.
   Closes #3190 
   
   
   ## What changes are included in this PR?
   Add PercentileCont message for expr.proto
   Add `CometPercentile` with validations
   Register Percentile class in QueryPlanSerde
   Handle PercentileCont protobuf in `planner.rs`
   Custom `percentile.rs` with Binary state because Datafusion percentile_cont 
stores all values as List<Float64> and shuffle with that will cause `Cannot 
cast list to non-list data types` error
   
   ## How are these changes tested?
   Add sql test include numerical and interval
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: Support Spark expression: percentile_cont [datafusion-comet]

Reply via email to