[
https://issues.apache.org/jira/browse/ARROW-16491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533803#comment-17533803
]
Joris Van den Bossche commented on ARROW-16491:
-----------------------------------------------
It is indeed not doing a safe conversion by default in the nested array case.
Simplifying the example a bit to focus on just the array conversion (which is
done column by column for the pandas.DataFrame -> Table conversion):
{code:python}
>>> pa.array(np.array([[1.5], [2.5, 3.5]], dtype=object),
>>> type=pa.list_(pa.int64()), safe=True)
<pyarrow.lib.ListArray object at 0x7f004fc74700>
[
[
1
],
[
2,
3
]
]
{code}
I noticed that for the non-nested (primitive) array case, this also depends on
whether the input is already an array or a generic list like:
{code:python}
>>> pa.array(np.array([1.5, 2.5]), type=pa.int64(), safe=True)
...
ArrowInvalid: Float value 1.5 was truncated converting to int64
{code}
vs
{code:python}
>>> pa.array([1.5, 2.5], type=pa.int64(), safe=True)
<pyarrow.lib.Int64Array object at 0x7f004fc72c40>
[
1,
2
]
{code}
Those two take a different code path ({{numpy_to_array.cc}} vs
{{python_to_arrow.cc}}), so apparently the {{safe}} keyword is not properly
handled in the second code path.
(don't know by heart if the nested array case also takes that second code path,
in which case it might be the same issue)
Another potentially related issue: ARROW-8567
> [Python] Table.from_pandas is doing unsafe cast for float array to int array
> ----------------------------------------------------------------------------
>
> Key: ARROW-16491
> URL: https://issues.apache.org/jira/browse/ARROW-16491
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.1, 7.0.0
> Reporter: LOUSSOUARN Brieuc
> Priority: Minor
> Attachments: image-2022-05-06-13-59-47-720.png,
> image-2022-05-06-14-04-37-954.png
>
>
> Hello,
> safe option is working well for scalar but not for list. To reproduce :
> {code:python}
> import pandas as pd
> import pyarrow as pa
> int_dataframe = pd.DataFrame({"array": [[1, 2]]})
> float_dataframe = pd.DataFrame({"array": [[1.5, 2.3]]})
> int_table = pa.Table.from_pandas(int_dataframe)
> {code}
> {code:python}
> >>> int_table
> pyarrow.Table
> array: list<item: int64>
> child 0, item: int64
> ----
> array: [[[1,2]]]
> {code}
> {code:python}
> # this is working instead of throwing a `ArrowInvalid: ... Conversion failed
> for column array with type`
> >>> table = pa.Table.from_pandas(float_dataframe, schema=int_table.schema)
> >>> table
> pyarrow.Table
> array: list<item: int64>
> child 0, item: int64
> ----
> array: [[[1,2]]]
> {code}
> Behavior for scalar is correct :
> {code:python}
> int_dataframe = pd.DataFrame({"array": [1]})
> float_dataframe = pd.DataFrame({"array": [1.5]})
> int_table = pa.Table.from_pandas(int_dataframe)
> table = pa.Table.from_pandas(float_dataframe, schema=int_table.schema) #
> raise:
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> Input In [6], in <module>
> ----> 1 table = pa.Table.from_pandas(float_dataframe, schema=int_table.schema)
> 2 table
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/table.pxi:1782,
> in pyarrow.lib.Table.from_pandas()
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:594,
> in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 589 return (isinstance(arr, np.ndarray) and
> 590 arr.flags.contiguous and
> 591 issubclass(arr.dtype.type, np.integer))
> 593 if nthreads == 1:
> --> 594 arrays = [convert_column(c, f)
> 595 for c, f in zip(columns_to_convert, convert_fields)]
> 596 else:
> 597 arrays = []
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:594,
> in <listcomp>(.0)
> 589 return (isinstance(arr, np.ndarray) and
> 590 arr.flags.contiguous and
> 591 issubclass(arr.dtype.type, np.integer))
> 593 if nthreads == 1:
> --> 594 arrays = [convert_column(c, f)
> 595 for c, f in zip(columns_to_convert, convert_fields)]
> 596 else:
> 597 arrays = []
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:581,
> in dataframe_to_arrays.<locals>.convert_column(col, field)
> 576 except (pa.ArrowInvalid,
> 577 pa.ArrowNotImplementedError,
> 578 pa.ArrowTypeError) as e:
> 579 e.args += ("Conversion failed for column {!s} with type {!s}"
> 580 .format(col.name, col.dtype),)
> --> 581 raise e
> 582 if not field_nullable and result.null_count > 0:
> 583 raise ValueError("Field {} was non-nullable but pandas column "
> 584 "had {} null values".format(str(field),
> 585 result.null_count))
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:575,
> in dataframe_to_arrays.<locals>.convert_column(col, field)
> 572 type_ = field.type
> 574 try:
> --> 575 result = pa.array(col, type=type_, from_pandas=True, safe=safe)
> 576 except (pa.ArrowInvalid,
> 577 pa.ArrowNotImplementedError,
> 578 pa.ArrowTypeError) as e:
> 579 e.args += ("Conversion failed for column {!s} with type {!s}"
> 580 .format(col.name, col.dtype),)
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/array.pxi:312,
> in pyarrow.lib.array()
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/array.pxi:83,
> in pyarrow.lib._ndarray_to_array()
> File
> ~/Documents/chouket/.venv/lib/python3.9/site-packages/pyarrow/error.pxi:99,
> in pyarrow.lib.check_status()
> ArrowInvalid: ('Float value 1.5 was truncated converting to int64',
> 'Conversion failed for column array with type float64')
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)