khwilson commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-2336776704
I know this is old, but I was curious about this. It seems that for _binary_
operations, PyArrow follows the rules of Redshift (see #40123). However, for
array aggregators (specifically, `sum`) it seems to retain the input type
(which is definitely wrong). I was curious what some other databases do, and so
attempted to put together a short list. The tl;dr is that @westonpace 's
suggestion to simply promote the `sum` to `decimal128(38, scale)` or
`decimal256(76, scale)` seems as good of a choice as any.
### Data
In each case, I ran approximately the following query in the database's
dialect (this query is for Clickhouse):
```sql
with base_values as (
select toDecimal128(1, 1) as foo
union all
select toDecimal128(1, 1) as foo
)
select toTypeName(sum(foo))
from base_values
```
#### Clickhouse
Like Arrow, the return type of `sum` on a column is the input type, but
Clickhouse [doesn't retain precision
information](https://clickhouse.com/docs/en/sql-reference/data-types/decimal)
in its decimal type. So this is basically what @westonpace suggests the
solution for Arrow should be.
#### Postgres
The return type of `sum` appears to be an ["unconstrained numeric"
type](https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL).
This was verified by running
```sql
create table sum_columns as
select
sum(foo) as the_sum
from base_values;
select
table_name,
column_name,
numeric_precision,
numeric_scale
from information_schema.columns
where table_name in ('sum_columns');
```
and noting that `numeric_precision` and `numeric_scale` are both `NULL`.
#### MariaDB
The return type of `sum` appears to be `numeric(min(65, 22 + precision),
scale)`. This was verified in the same way as postgres. N.B. 65 is the [maximum
width](https://mariadb.com/kb/en/decimal/) of a decimal in MariaDB
#### duckdb
The return type of `sum` appears to be `decimal(38, s)` no matter its
[internal
representation](https://duckdb.org/docs/sql/data_types/numeric.html#fixed-point-decimals).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]