My mistake. I was thinking of CUME_DIST, which is the inverse of 
PERCENTILE_DIST and returns a fraction.

I agree that the return type of “PERCENTILE_DIST(fraction) WITHIN GROUP (ORDER 
BY x)” should be the same as the type of “x”.

> On Aug 7, 2022, at 12:59 AM, Itiel Sadeh <[email protected]> 
> wrote:
> 
> opened a ticket in https://issues.apache.org/jira/browse/CALCITE-5230 
> <https://issues.apache.org/jira/browse/CALCITE-5230>.
> Maybe I'm missing something, but I think that in your example, the
> percentile_disc will calculate the percentile that you specify and the
> output will be the appropriate value from the input column.
> 
> We can continue this discussion in the Jira ticket.
> 
> Itiel
> On Fri, Aug 5, 2022 at 9:00 PM Julian Hyde <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> I’m not sure what Postgres has done, but it doesn’t make sense to derive
>> the return type of PERCENTILE_DISC from the argument type, even though both
>> are typically numeric types. (Strictly speaking, the argument to
>> PERCENTILE_DISC can be any type that has a ‘-‘ operation, so it ought to
>> work on DATE, TIMESTAMP, INTERVAL as well as numeric.)
>> 
>> To take an extreme example, if I am a geologist, the argument to
>> PERCENTILE_DISC might be the age of my rock samples in millions of years,
>> and the output will be a number between 0 and 1.
>> 
>> I think your problem is that you don’t like the default data type returned
>> by PERCENTILE_DISC. The solution would be to make the policy customizable
>> via RelDataTypeSystem, as we did for other aggregate functions in
>> https://issues.apache.org/jira/browse/CALCITE-1945 <
>> https://issues.apache.org/jira/browse/CALCITE-1945 
>> <https://issues.apache.org/jira/browse/CALCITE-1945>>.
>> 
>> Can you please log a Jira case for this, and we can discuss further there?
>> 
>> Julian
>> 
>>> On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <[email protected]>
>> wrote:
>>> 
>>> Hi Julian, thanks for your email.
>>> 
>>> I don't think those tickets will solve the problem as from what I
>>> understand, they are referring to the input types, while my issue is with
>>> the return type (although it might have the same root cause).
>>> 
>>> As you can see in the code
>>> <
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304
>>> ,
>>> right now the return type is hard coded as DOUBLE. The problem is that in
>>> the following query:
>>> 
>>>> select percentile_disc(0.12) within group(order by x) from t;
>>> 
>>> The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
>>> be outside of it (in the WITHIN_GROUP call). so the return type inference
>>> cannot use the type of the column "x" when it's inferring the return
>> type.
>>> 
>>> Itiel
>>> 
>>> On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <[email protected]>
>> wrote:
>>> 
>>>> I believer that PERCENTILE_DISC was implemented in
>>>> https://issues.apache.org/jira/browse/CALCITE-4644 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
>>>> issues to support any sortable type [
>>>> https://issues.apache.org/jira/browse/CALCITE-4670 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to
>> change
>>>> the implementation strategy [
>>>> https://issues.apache.org/jira/browse/CALCITE-4666 <
>>>> https://issues.apache.org/jira/browse/CALCITE-4666> ].
>>>> 
>>>> Does your case map onto any of those?
>>>> 
>>>> Is PERCENTILE_DISC able to deduce return types from its arguments today?
>>>> If so, how does it do it?
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <[email protected]
>>> 
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> First of all I just wanted to thank all of you for the work you are
>> doing
>>>>> on Calcite.
>>>>> 
>>>>> We want to add support for the percentile_disc aggregate function, but
>> we
>>>>> encounter a problem.
>>>>> Our percentile_disc return type is dependent on the column of the
>> "order
>>>>> by" clause (just like postgresql
>>>>> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
>>>>> However, I don't see how to achieve that on calcite. The issue is that
>>>> the
>>>>> sort column argument is not passed to the percentile_disc function,
>>>> Rather,
>>>>> it is stored outside of it as a collation. Therefore, I cannot use the
>>>>> SqlReturnTypeInference mechanism.
>>>>> 
>>>>> If someone has an idea on how to achieve this it would be very much
>>>>> appreciated.
>>>>> Thank you,
>>>>> 
>>>>> Itiel

Reply via email to