pitrou commented on PR #44184:
URL: https://github.com/apache/arrow/pull/44184#issuecomment-2414437715
I'm lukewarm about the approach here. Silently casting to the max precision
discards metadata about the input; it also risks producing errors further down
the line (if e.g. the max precision is deemed too large for other operations).
It also doesn't automatically eliminate any potential overflow, for example:
```python
>>> a = pa.array([789.3] * 20).cast(pa.decimal128(38, 35))
>>> a
<pyarrow.lib.Decimal128Array object at 0x7f0f103ca7a0>
[
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440,
789.29999999999995452526491135358810440
]
>>> pc.sum(a)
<pyarrow.Decimal128Scalar:
Decimal('-1228.11834604692408266343214451664848480')>
```
We should instead check that the result of an aggregate fits into the
resulting Decimal type, while overflows currently pass silently:
```python
>>> a = pa.array([123., 456., 789.]).cast(pa.decimal128(4, 1))
>>> a
<pyarrow.lib.Decimal128Array object at 0x7f0ed06261a0>
[
123.0,
456.0,
789.0
]
>>> pc.sum(a)
<pyarrow.Decimal128Scalar: Decimal('1368.0')>
>>> pc.sum(a).validate(full=True)
Traceback (most recent call last):
...
ArrowInvalid: Decimal value 13680 does not fit in precision of decimal128(4,
1)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]