[ 
https://issues.apache.org/jira/browse/ARROW-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403083#comment-17403083
 ] 

Joris Van den Bossche commented on ARROW-8017:
----------------------------------------------

Although I closed this as a "won't fix" (short term), we actually have 
ARROW-5931 about a potential way to store custom Python objects.

> [Python] Pyarrow no support for pathlib Path with table = 
> pa.Table.from_pandas() or pd.to_parquet()
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-8017
>                 URL: https://issues.apache.org/jira/browse/ARROW-8017
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>         Environment: Conda :
> arrow-cpp                 0.15.1           py38h7cd5009_5
> numba                     0.48.0           py38h0573a6f_0  
> numpy                     1.18.1           py38h4f9e942_0  
> numpy-base                1.18.1           py38hde5b4d6_1
> pandas                    1.0.1            py38h0573a6f_0
> pyarrow                   0.15.1           py38h0573a6f_0
> pycparser                 2.19                       py_0
> python                    3.8.1                h0371630_1  
> python-dateutil           2.8.1                      py_0
>            Reporter: Iemand
>            Priority: Minor
>              Labels: features
>
> Trying to store a table with Python's pathlib Path will give an ArrowInvalid:
> {{ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not 
> recognize Python value type when inferring an Arrow data type', 'Conversion 
> failed for column filepath with type object')}}
> {{Pandas approach:}}
> {code:python}
> import pandas as pd
> df_test = pd.DataFrame({"filepath": [Path("foo", "spam.wav")]})
> df_test.to_parquet("egg.parquet"){code}
>  
> {{Parquet approach}}
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> table = pa.Table.from_pandas(df_test)  # fails here
> # pq.write_table(table, 'egg.parquet') # , version='2.0'
> {code}
>  
> {{Full error Traceback of }}{{pa.Table.from_pandas}}
> {code:python}
> ---------------------------------------------------------------------------
> ArrowInvalid                              Traceback (most recent call last)
> <ipython-input-220-bce69439945e> in <module>
>       2 import pyarrow.parquet as pq
>       3 
> ----> 4 table = pa.Table.from_pandas(df_test)
>       5 pq.write_table(table, 'egg.parquet', version='2.0')
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.Table.from_pandas()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
>  in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
>     552 
>     553     if nthreads == 1:
> --> 554         arrays = [convert_column(c, f)
>     555                   for c, f in zip(columns_to_convert, convert_fields)]
>     556     else:
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
>  in <listcomp>(.0)
>     552 
>     553     if nthreads == 1:
> --> 554         arrays = [convert_column(c, f)
>     555                   for c, f in zip(columns_to_convert, convert_fields)]
>     556     else:
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
>  in convert_column(col, field)
>     544             e.args += ("Conversion failed for column {0!s} with type 
> {1!s}"
>     545                        .format(col.name, col.dtype),)
> --> 546             raise e
>     547         if not field_nullable and result.null_count > 0:
>     548             raise ValueError("Field {} was non-nullable but pandas 
> column "
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/pandas_compat.py
>  in convert_column(col, field)
>     538 
>     539         try:
> --> 540             result = pa.array(col, type=type_, from_pandas=True, 
> safe=safe)
>     541         except (pa.ArrowInvalid,
>     542                 pa.ArrowNotImplementedError,
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/array.pxi in 
> pyarrow.lib.array()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
> ~/anaconda3/envs/soundrhythm/lib/python3.8/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
> ArrowInvalid: ('Could not convert foo/spam.wav with type PosixPath: did not 
> recognize Python value type when inferring an Arrow data type', 'Conversion 
> failed for column filepath with type object'){code}
> Might be related to https://issues.apache.org/jira/browse/ARROW-2046 , 
> although that was about file save location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to