[jira] [Commented] (ARROW-9040) [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and use_pandas_metadata=True

Joris Van den Bossche (Jira) Fri, 05 Jun 2020 04:38:12 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126704#comment-17126704
 ]


Joris Van den Bossche commented on ARROW-9040:
----------------------------------------------

Thanks for the report! 
This is because the index column is serialized differently for RangeIndex (as a 
dict, instead of a name referencing the column in the arrow table). 

This is actually already being fixed by 
https://github.com/apache/arrow/pull/7156 (we have a test for this, but it is 
only getting enabled in that PR because it was using a buffer)

> [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and 
> use_pandas_metadata=True
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-9040
>                 URL: https://issues.apache.org/jira/browse/ARROW-9040
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.1
>            Reporter: cmsxbc
>            Priority: Major
>
> Loading a parquet file write by pandas with default index.
> When call _ParquetDatasetV2.read(columns=['column'], 
> use_pandas_metadata=True),
> "TypeError: unhashable type 'dict'"  were raised from 
> {code:java}
> index_columns = set(_get_pandas_index_columns(metadata)){code}
> Is it because of pandas default index？
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9040) [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and use_pandas_metadata=True

Reply via email to