My mistake. I was thinking of CUME_DIST, which is the inverse of PERCENTILE_DIST and returns a fraction.
I agree that the return type of “PERCENTILE_DIST(fraction) WITHIN GROUP (ORDER BY x)” should be the same as the type of “x”. > On Aug 7, 2022, at 12:59 AM, Itiel Sadeh <[email protected]> > wrote: > > opened a ticket in https://issues.apache.org/jira/browse/CALCITE-5230 > <https://issues.apache.org/jira/browse/CALCITE-5230>. > Maybe I'm missing something, but I think that in your example, the > percentile_disc will calculate the percentile that you specify and the > output will be the appropriate value from the input column. > > We can continue this discussion in the Jira ticket. > > Itiel > On Fri, Aug 5, 2022 at 9:00 PM Julian Hyde <[email protected] > <mailto:[email protected]>> wrote: > >> I’m not sure what Postgres has done, but it doesn’t make sense to derive >> the return type of PERCENTILE_DISC from the argument type, even though both >> are typically numeric types. (Strictly speaking, the argument to >> PERCENTILE_DISC can be any type that has a ‘-‘ operation, so it ought to >> work on DATE, TIMESTAMP, INTERVAL as well as numeric.) >> >> To take an extreme example, if I am a geologist, the argument to >> PERCENTILE_DISC might be the age of my rock samples in millions of years, >> and the output will be a number between 0 and 1. >> >> I think your problem is that you don’t like the default data type returned >> by PERCENTILE_DISC. The solution would be to make the policy customizable >> via RelDataTypeSystem, as we did for other aggregate functions in >> https://issues.apache.org/jira/browse/CALCITE-1945 < >> https://issues.apache.org/jira/browse/CALCITE-1945 >> <https://issues.apache.org/jira/browse/CALCITE-1945>>. >> >> Can you please log a Jira case for this, and we can discuss further there? >> >> Julian >> >>> On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <[email protected]> >> wrote: >>> >>> Hi Julian, thanks for your email. >>> >>> I don't think those tickets will solve the problem as from what I >>> understand, they are referring to the input types, while my issue is with >>> the return type (although it might have the same root cause). >>> >>> As you can see in the code >>> < >> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304 >>> , >>> right now the return type is hard coded as DOUBLE. The problem is that in >>> the following query: >>> >>>> select percentile_disc(0.12) within group(order by x) from t; >>> >>> The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will >>> be outside of it (in the WITHIN_GROUP call). so the return type inference >>> cannot use the type of the column "x" when it's inferring the return >> type. >>> >>> Itiel >>> >>> On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <[email protected]> >> wrote: >>> >>>> I believer that PERCENTILE_DISC was implemented in >>>> https://issues.apache.org/jira/browse/CALCITE-4644 < >>>> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open >>>> issues to support any sortable type [ >>>> https://issues.apache.org/jira/browse/CALCITE-4670 < >>>> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to >> change >>>> the implementation strategy [ >>>> https://issues.apache.org/jira/browse/CALCITE-4666 < >>>> https://issues.apache.org/jira/browse/CALCITE-4666> ]. >>>> >>>> Does your case map onto any of those? >>>> >>>> Is PERCENTILE_DISC able to deduce return types from its arguments today? >>>> If so, how does it do it? >>>> >>>> Julian >>>> >>>> >>>>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <[email protected] >>> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> First of all I just wanted to thank all of you for the work you are >> doing >>>>> on Calcite. >>>>> >>>>> We want to add support for the percentile_disc aggregate function, but >> we >>>>> encounter a problem. >>>>> Our percentile_disc return type is dependent on the column of the >> "order >>>>> by" clause (just like postgresql >>>>> <https://www.postgresql.org/docs/current/functions-aggregate.html>). >>>>> However, I don't see how to achieve that on calcite. The issue is that >>>> the >>>>> sort column argument is not passed to the percentile_disc function, >>>> Rather, >>>>> it is stored outside of it as a collation. Therefore, I cannot use the >>>>> SqlReturnTypeInference mechanism. >>>>> >>>>> If someone has an idea on how to achieve this it would be very much >>>>> appreciated. >>>>> Thank you, >>>>> >>>>> Itiel
