Antony Mayi created ARROW-2160:

             Summary: decimal precision inference
                 Key: ARROW-2160
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
    Affects Versions: 0.8.0
            Reporter: Antony Mayi

import pyarrow as pa
import pandas as pd
import decimal

df = pd.DataFrame({'a': [decimal.Decimal('0.1'), decimal.Decimal('0.01')]})

pyarrow.lib.ArrowInvalid: Decimal type with precision 2 does not fit into 
precision inferred from first array element: 1

Looks arrow is inferring the highest precision for given column based on the 
first cell and expecting the rest fits in. I understand this is by design but 
from the point of view of pandas-arrow compatibility this is quite painful as 
pandas is more flexible (as demonstrated).

What this means is that user trying to pass pandas {{DataFrame}} with 
{{Decimal}} column(s) to arrow {{Table}} would always have to first:
# Find the highest precision used in (each of) that column(s)
# Adjust the first cell of (each of) that column(s) so it has the highest 
precision of that column(s)
# Only then pass such {{DataFrame}} to {{Table.from_pandas()}}

So given this unavoidable procedure (and assuming arrow needs to be strict 
about the highest precision for a column) - shouldn't this logic be part of the 
{{Table.from_pandas()}} directly to make this transparent?

This message was sent by Atlassian JIRA

Reply via email to