Iemand created ARROW-8017: ----------------------------- Summary: [Python] Pyarrow no support for pathlib Path with table = pa.Table.from_pandas() or pd.to_parquet() Key: ARROW-8017 URL: https://issues.apache.org/jira/browse/ARROW-8017 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1 Environment: Conda : arrow-cpp 0.15.1 py38h7cd5009_5 numba 0.48.0 py38h0573a6f_0 numpy 1.18.1 py38h4f9e942_0 numpy-base 1.18.1 py38hde5b4d6_1 pandas 1.0.1 py38h0573a6f_0 pyarrow 0.15.1 py38h0573a6f_0 pycparser 2.19 py_0 python 3.8.1 h0371630_1 python-dateutil 2.8.1 py_0
Reporter: Iemand Trying to store a table with Python's pathlib Path will give an ArrowInvalid: {{ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object')}} {{Pandas approach:}} {code:python} import pandas as pd df_test = pd.DataFrame({"filepath": [Path("foo", "spam.wav")]}) df_test.to_parquet("egg.parquet"){code} {{Parquet approach}} {code:python} import pyarrow as pa import pyarrow.parquet as pq table = pa.Table.from_pandas(df_test) # fails here # pq.write_table(table, 'egg.parquet') # , version='2.0' {code} {{Full error Traceback of }}{{pa.Table.from_pandas}} {code:python} --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-220-bce69439945e> in <module> 2 import pyarrow.parquet as pq 3 ----> 4 table = pa.Table.from_pandas(df_test) 5 pq.write_table(table, 'egg.parquet', version='2.0') ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas() ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) 552 553 if nthreads == 1: --> 554 arrays = [convert_column(c, f) 555 for c, f in zip(columns_to_convert, convert_fields)] 556 else: ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in <listcomp>(.0) 552 553 if nthreads == 1: --> 554 arrays = [convert_column(c, f) 555 for c, f in zip(columns_to_convert, convert_fields)] 556 else: ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field) 544 e.args += ("Conversion failed for column {0!s} with type {1!s}" 545 .format(col.name, col.dtype),) --> 546 raise e 547 if not field_nullable and result.null_count > 0: 548 raise ValueError("Field {} was non-nullable but pandas column " ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py in convert_column(col, field) 538 539 try: --> 540 result = pa.array(col, type=type_, from_pandas=True, safe=safe) 541 except (pa.ArrowInvalid, 542 pa.ArrowNotImplementedError, ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/array.pxi in pyarrow.lib.array() ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column filepath with type object'){code} Might be related to https://issues.apache.org/jira/browse/ARROW-2046 , although that was about file save location. -- This message was sent by Atlassian Jira (v8.3.4#803005)