aryan-212 commented on PR #21388:
URL: https://github.com/apache/datafusion/pull/21388#issuecomment-4197155020

   ### How Databricks treats `percentile` vs `approx_percentile`
   
   Databricks draws a clear semantic difference between its two percentile 
functions:
   
   | Function | Semantics | Behavior |
   |---|---|---|
   | `percentile` / `percentile_cont` | **Continuous** — interpolates between 
adjacent values | `median([1, 2])` = **1.5** |
   | `percentile_approx` / `approx_percentile` | **Discrete** — returns an 
actual observed value from the dataset | `approx_median([1, 2])` = **1** |
   
   This was verified by running the equivalent window query on Databricks 
against the same 21-row dataset used in DataFusion's `window_using_aggregates` 
test. The Databricks output confirmed that `percentile_approx` picks the 
nearest-rank value (no interpolation), while `percentile` interpolates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to