Re: Percentile_Disc return type

Julian Hyde Fri, 05 Aug 2022 11:00:00 -0700

I’m not sure what Postgres has done, but it doesn’t make sense to derive the 
return type of PERCENTILE_DISC from the argument type, even though both are 
typically numeric types. (Strictly speaking, the argument to PERCENTILE_DISC 
can be any type that has a ‘-‘ operation, so it ought to work on DATE, 
TIMESTAMP, INTERVAL as well as numeric.)


To take an extreme example, if I am a geologist, the argument to 
PERCENTILE_DISC might be the age of my rock samples in millions of years, and 
the output will be a number between 0 and 1.

I think your problem is that you don’t like the default data type returned by 
PERCENTILE_DISC. The solution would be to make the policy customizable via 
RelDataTypeSystem, as we did for other aggregate functions in 
https://issues.apache.org/jira/browse/CALCITE-1945 
<https://issues.apache.org/jira/browse/CALCITE-1945>.

Can you please log a Jira case for this, and we can discuss further there?

Julian

> On Aug 5, 2022, at 4:23 AM, Itiel Sadeh <[email protected]> wrote:
> 
> Hi Julian, thanks for your email.
> 
> I don't think those tickets will solve the problem as from what I
> understand, they are referring to the input types, while my issue is with
> the return type (although it might have the same root cause).
> 
> As you can see in the code
> <https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L2304>,
> right now the return type is hard coded as DOUBLE. The problem is that in
> the following query:
> 
>> select percentile_disc(0.12) within group(order by x) from t;
> 
> The PERCENTILE_DISC argument will be only 0.12, and the argument "x" will
> be outside of it (in the WITHIN_GROUP call). so the return type inference
> cannot use the type of the column "x" when it's inferring the return type.
> 
> Itiel
> 
> On Thu, Aug 4, 2022 at 9:43 PM Julian Hyde <[email protected]> wrote:
> 
>> I believer that PERCENTILE_DISC was implemented in
>> https://issues.apache.org/jira/browse/CALCITE-4644 <
>> https://issues.apache.org/jira/browse/CALCITE-4644>. There are open
>> issues to support any sortable type [
>> https://issues.apache.org/jira/browse/CALCITE-4670 <
>> https://issues.apache.org/jira/browse/CALCITE-4670> ] and also to change
>> the implementation strategy [
>> https://issues.apache.org/jira/browse/CALCITE-4666 <
>> https://issues.apache.org/jira/browse/CALCITE-4666> ].
>> 
>> Does your case map onto any of those?
>> 
>> Is PERCENTILE_DISC able to deduce return types from its arguments today?
>> If so, how does it do it?
>> 
>> Julian
>> 
>> 
>>> On Aug 3, 2022, at 1:40 PM, Itiel Sadeh <[email protected]>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> First of all I just wanted to thank all of you for the work you are doing
>>> on Calcite.
>>> 
>>> We want to add support for the percentile_disc aggregate function, but we
>>> encounter a problem.
>>> Our percentile_disc return type is dependent on the column of the "order
>>> by" clause (just like postgresql
>>> <https://www.postgresql.org/docs/current/functions-aggregate.html>).
>>> However, I don't see how to achieve that on calcite. The issue is that
>> the
>>> sort column argument is not passed to the percentile_disc function,
>> Rather,
>>> it is stored outside of it as a collation. Therefore, I cannot use the
>>> SqlReturnTypeInference mechanism.
>>> 
>>> If someone has an idea on how to achieve this it would be very much
>>> appreciated.
>>> Thank you,
>>> 
>>> Itiel
>> 
>>

Re: Percentile_Disc return type

Reply via email to