pyarrow data type casting problem when safe=True

Bryan Cutler Thu, 10 Jan 2019 14:20:11 -0800

Hi All,

I have a question about using pyarrow.Array.from_pandas with the safe flag
set to True.  When the Pandas data contains integers and NULL values, it
will get changed to a floating point dtype and then if the type is casted
back to an integer in Arrow, it will raise an error "ArrowInvalid: Floating
point value truncated". Is this the expected behavior? I'm guessing it
doesn't look at the actual values, just what type is being converted. Is
there a way around this specific error besides setting safe to False?  Here
is a concise example:


>>> pa.Array.from_pandas(pd.Series([1, None]), type=pa.int32(), safe=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/array.pxi", line 474, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 169, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 69, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Floating point value truncated

I came across this issue in https://github.com/apache/spark/pull/22807,
specifically withi this discussion
https://github.com/apache/spark/pull/22807#discussion_r246859417.

Thanks!
Bryan

pyarrow data type casting problem when safe=True

Reply via email to