aryan-212 commented on PR #21388: URL: https://github.com/apache/datafusion/pull/21388#issuecomment-4197155020
### How Databricks treats `percentile` vs `approx_percentile` Databricks draws a clear semantic difference between its two percentile functions: | Function | Semantics | Behavior | |---|---|---| | `percentile` / `percentile_cont` | **Continuous** — interpolates between adjacent values | `median([1, 2])` = **1.5** | | `percentile_approx` / `approx_percentile` | **Discrete** — returns an actual observed value from the dataset | `approx_median([1, 2])` = **1** | This was verified by running the equivalent window query on Databricks against the same 21-row dataset used in DataFusion's `window_using_aggregates` test. The Databricks output confirmed that `percentile_approx` picks the nearest-rank value (no interpolation), while `percentile` interpolates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
