khwilson commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-2336776704

   I know this is old, but I was curious about this. It seems that for _binary_ 
operations, PyArrow follows the rules of Redshift (see #40123). However, for 
array aggregators (specifically, `sum`) it seems to retain the input type 
(which is definitely wrong). I was curious what some other databases do, and so 
attempted to put together a short list. The tl;dr is that @westonpace 's 
suggestion to simply promote the `sum` to `decimal128(38, scale)` or 
`decimal256(76, scale)` seems as good of a choice as any.
   
   ### Data
   
   In each case, I ran approximately the following query in the database's 
dialect (this query is for Clickhouse):
   
   ```sql
   with base_values as (
     select toDecimal128(1, 1) as foo
     union all
     select toDecimal128(1, 1) as foo
   )
   select toTypeName(sum(foo))
   from base_values
   ```
   
   #### Clickhouse
   
   Like Arrow, the return type of `sum` on a column is the input type, but 
Clickhouse [doesn't retain precision 
information](https://clickhouse.com/docs/en/sql-reference/data-types/decimal) 
in its decimal type. So this is basically what @westonpace suggests the 
solution for Arrow should be. 
   
   #### Postgres
   
   The return type of `sum` appears to be an ["unconstrained numeric" 
type](https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL).
 This was verified by running
   
   ```sql
   create table sum_columns as
   select 
     sum(foo) as the_sum
   from base_values;
   
   select
     table_name,
     column_name,
     numeric_precision,
     numeric_scale
   from information_schema.columns
   where table_name in ('sum_columns');
   ```
   
   and noting that `numeric_precision` and `numeric_scale` are both `NULL`.
   
   #### MariaDB
   
   The return type of `sum` appears to be `numeric(min(65, 22 + precision), 
scale)`. This was verified in the same way as postgres. N.B. 65 is the [maximum 
width](https://mariadb.com/kb/en/decimal/) of a decimal in MariaDB
   
   #### duckdb
   
   The return type of `sum` appears to be `decimal(38, s)` no matter its 
[internal 
representation](https://duckdb.org/docs/sql/data_types/numeric.html#fixed-point-decimals).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to