[
https://issues.apache.org/jira/browse/ARROW-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403083#comment-17403083
]
Joris Van den Bossche commented on ARROW-8017:
----------------------------------------------
Although I closed this as a "won't fix" (short term), we actually have
ARROW-5931 about a potential way to store custom Python objects.
> [Python] Pyarrow no support for pathlib Path with table =
> pa.Table.from_pandas() or pd.to_parquet()
> ---------------------------------------------------------------------------------------------------
>
> Key: ARROW-8017
> URL: https://issues.apache.org/jira/browse/ARROW-8017
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Environment: Conda :
> arrow-cpp 0.15.1 py38h7cd5009_5
> numba 0.48.0 py38h0573a6f_0
> numpy 1.18.1 py38h4f9e942_0
> numpy-base 1.18.1 py38hde5b4d6_1
> pandas 1.0.1 py38h0573a6f_0
> pyarrow 0.15.1 py38h0573a6f_0
> pycparser 2.19 py_0
> python 3.8.1 h0371630_1
> python-dateutil 2.8.1 py_0
> Reporter: Iemand
> Priority: Minor
> Labels: features
>
> Trying to store a table with Python's pathlib Path will give an ArrowInvalid:
> {{ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not
> recognize Python value type when inferring an Arrow data type', 'Conversion
> failed for column filepath with type object')}}
> {{Pandas approach:}}
> {code:python}
> import pandas as pd
> df_test = pd.DataFrame({"filepath": [Path("foo", "spam.wav")]})
> df_test.to_parquet("egg.parquet"){code}
>
> {{Parquet approach}}
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> table = pa.Table.from_pandas(df_test) # fails here
> # pq.write_table(table, 'egg.parquet') # , version='2.0'
> {code}
>
> {{Full error Traceback of }}{{pa.Table.from_pandas}}
> {code:python}
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-220-bce69439945e> in <module>
> 2 import pyarrow.parquet as pq
> 3
> ----> 4 table = pa.Table.from_pandas(df_test)
> 5 pq.write_table(table, 'egg.parquet', version='2.0')
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/table.pxi in
> pyarrow.lib.Table.from_pandas()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
> in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
> 552
> 553 if nthreads == 1:
> --> 554 arrays = [convert_column(c, f)
> 555 for c, f in zip(columns_to_convert, convert_fields)]
> 556 else:
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
> in <listcomp>(.0)
> 552
> 553 if nthreads == 1:
> --> 554 arrays = [convert_column(c, f)
> 555 for c, f in zip(columns_to_convert, convert_fields)]
> 556 else:
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
> in convert_column(col, field)
> 544 e.args += ("Conversion failed for column {0!s} with type
> {1!s}"
> 545 .format(col.name, col.dtype),)
> --> 546 raise e
> 547 if not field_nullable and result.null_count > 0:
> 548 raise ValueError("Field {} was non-nullable but pandas
> column "
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
> in convert_column(col, field)
> 538
> 539 try:
> --> 540 result = pa.array(col, type=type_, from_pandas=True,
> safe=safe)
> 541 except (pa.ArrowInvalid,
> 542 pa.ArrowNotImplementedError,
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/array.pxi in
> pyarrow.lib.array()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in
> pyarrow.lib.check_status()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in
> pyarrow.lib.check_status()
> ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not
> recognize Python value type when inferring an Arrow data type', 'Conversion
> failed for column filepath with type object'){code}
> Might be related to https://issues.apache.org/jira/browse/ARROW-2046 ,
> although that was about file save location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)