[
https://issues.apache.org/jira/browse/ARROW-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126676#comment-17126676
]
Tomas Remes commented on ARROW-6872:
------------------------------------
Hello, I am using version 0.17.1 and I am getting similar following error when
trying to use "object": pa.dictionary(pa.int64(), pa.string())" in my schema:
{noformat}
Traceback (most recent call last):
File "/home/tremes/GITHUB/data-pipeline/utils/parquet_serializer.py", line
70, in json_to_parquet
table = pa.Table.from_pandas(df, schema)
File "pyarrow/table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
File
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
line 575, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
File
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
line 575, in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
File
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
line 566, in convert_column
raise e
File
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
line 560, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 106, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: ('Sequence converter for type
dictionary<values=string, indices=int64, ordered=0> not implemented',
'Conversion failed for column labels with type object')
{noformat}
am I missing anything? Thanks
> [C++][Python] Empty table with dictionary-columns raises
> ArrowNotImplementedError
> ---------------------------------------------------------------------------------
>
> Key: ARROW-6872
> URL: https://issues.apache.org/jira/browse/ARROW-6872
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.15.0
> Reporter: Marco Neumann
> Assignee: Joris Van den Bossche
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.17.0
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> h2. Abstract
> As a pyarrow user, I would expect that I can create an empty table out of
> every schema that I created via pandas. This does not work for dictionary
> types (e.g. {{"category"}} dtypes).
> h2. Test Case
> This code:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"x": pd.Series(["x", "y"], dtype="category")})
> table = pa.Table.from_pandas(df)
> schema = table.schema
> table_empty = schema.empty_table() # boom
> {code}
> produces this exception:
> {noformat}
> Traceback (most recent call last):
> File "arrow_bug.py", line 8, in <module>
> table_empty = schema.empty_table()
> File "pyarrow/types.pxi", line 860, in __iter__
> File "pyarrow/array.pxi", line 211, in pyarrow.lib.array
> File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: Sequence converter for type
> dictionary<values=string, indices=int8, ordered=0> not implemented
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)