[jira] [Commented] (ARROW-6872) [C++][Python] Empty table with dictionary-columns raises ArrowNotImplementedError

Tomas Remes (Jira) Fri, 05 Jun 2020 04:05:26 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126676#comment-17126676
 ]


Tomas Remes commented on ARROW-6872:
------------------------------------

Hello, I am using version 0.17.1 and I am getting similar following error when 
trying to use "object": pa.dictionary(pa.int64(), pa.string())" in my schema:
{noformat}
Traceback (most recent call last):
  File "/home/tremes/GITHUB/data-pipeline/utils/parquet_serializer.py", line 
70, in json_to_parquet
    table = pa.Table.from_pandas(df, schema)
  File "pyarrow/table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
  File 
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 575, in dataframe_to_arrays
    for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 575, in <listcomp>
    for c, f in zip(columns_to_convert, convert_fields)]
  File 
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 566, in convert_column
    raise e
  File 
"/home/tremes/GITHUB/data-pipeline/venv/lib64/python3.7/site-packages/pyarrow/pandas_compat.py",
 line 560, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 106, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: ('Sequence converter for type 
dictionary<values=string, indices=int64, ordered=0> not implemented', 
'Conversion failed for column labels with type object')

{noformat}
am I missing anything? Thanks

> [C++][Python] Empty table with dictionary-columns raises 
> ArrowNotImplementedError
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-6872
>                 URL: https://issues.apache.org/jira/browse/ARROW-6872
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.15.0
>            Reporter: Marco Neumann
>            Assignee: Joris Van den Bossche
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.17.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> h2. Abstract
> As a pyarrow user, I would expect that I can create an empty table out of 
> every schema that I created via pandas. This does not work for dictionary 
> types (e.g. {{"category"}} dtypes).
> h2. Test Case
> This code:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"x": pd.Series(["x", "y"], dtype="category")})
> table = pa.Table.from_pandas(df)
> schema = table.schema
> table_empty = schema.empty_table()  # boom
> {code}
> produces this exception:
> {noformat}
> Traceback (most recent call last):
>   File "arrow_bug.py", line 8, in <module>
>     table_empty = schema.empty_table()
>   File "pyarrow/types.pxi", line 860, in __iter__
>   File "pyarrow/array.pxi", line 211, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 86, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: Sequence converter for type 
> dictionary<values=string, indices=int8, ordered=0> not implemented
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-6872) [C++][Python] Empty table with dictionary-columns raises ArrowNotImplementedError

Reply via email to