[ 
https://issues.apache.org/jira/browse/ARROW-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606681#comment-17606681
 ] 

Weston Pace edited comment on ARROW-17484 at 9/19/22 5:19 PM:
--------------------------------------------------------------

Aggregate functions typically have very small outputs compared to the input 
(e.g. the sum of 1 million rows is a single value) and so it very often makes 
sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have 
to cast the entire array of inputs (e.g. the 1 million rows) and this could be 
rather costly.

Finally, we are mirroring SQL here (which is not, by itself, necessarily a good 
thing, but it is worth noting).  From the [postgres 
docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the 
return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double 
precision for floating-point arguments, otherwise the same as the argument data 
type
{quote}





was (Author: westonpace):
Aggregate functions typically have very small outputs compared to the input 
(e.g. the sum of 1 million rows is a single value) and so it very often makes 
sense for the output type to be larger than the input type.

One could argue that you can simply cast beforehand.  However, you would have 
to cast the entire array of inputs (e.g. the 1 million rows) and this could be 
rather costly.

Finally, we are mirroring SQL here (which is not, necessarily a good thing, but 
worth noting).  From the [postgres 
docs|https://www.postgresql.org/docs/8.2/functions-aggregate.html] for sum the 
return type is:

{quote}
bigint for smallint or int arguments, numeric for bigint arguments, double 
precision for floating-point arguments, otherwise the same as the argument data 
type
{quote}




> [C++] Substrait to Arrow Aggregate doesn't take the provided Output Type for 
> aggregates
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-17484
>                 URL: https://issues.apache.org/jira/browse/ARROW-17484
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Vibhatha Lakmal Abeykoon
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>
> The current Substrait to Aggregate deserializer doesn't take the plan 
> provided output type as the output type of the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to