jorisvandenbossche commented on pull request #9948:
URL: https://github.com/apache/arrow/pull/9948#issuecomment-817857305
@pitrou thanks a lot for that input.
To be explicit: do you think this logic should be used for *all* inference
code paths?
Because currently, our inference of scale/precision from a string
(`Decimal128::FromString`) vs from a python decimal
(`InferDecimalPrecisionAndScale`) have some different behaviour regarding
trailing zero's. For a value like `Decimal('123E+2')`, the first would give a
(precision, scale) of (5, 0), while the other (3, -2).
And it is this difference that gives a problem because now we first infer
the type from the python decimal in a first pass, but then for the actual
conversion in a second pass go through the string representation of the
decimals, which gives conflicting types, even in the case of non-mixed
decimals. For example with a single decimal (on master):
```
In [1]: from decimal import Decimal
In [2]: pa.array([Decimal('123E+2')])
...
ArrowInvalid: Decimal type with precision 5 does not fit into precision
inferred from first array element: 3
../src/arrow/python/python_to_arrow.cc:169
internal::DecimalFromPyObject(obj, *type, &value)
../src/arrow/python/python_to_arrow.cc:486
PyValue::Convert(this->primitive_type_, this->options_, value)
../src/arrow/python/iterators.h:69 func(value, static_cast<int64_t>(i),
&keep_going)
../src/arrow/python/python_to_arrow.cc:1055 converter->Extend(seq, size)
In [3]: pa.infer_type([Decimal('123E+2')])
Out[3]: Decimal128Type(decimal128(3, -2))
```
I think the logic you proposes matches the current logic of inferring from a
python decimal, but thus not when inferring from a string.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]