[
https://issues.apache.org/jira/browse/ARROW-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126704#comment-17126704
]
Joris Van den Bossche commented on ARROW-9040:
----------------------------------------------
Thanks for the report!
This is because the index column is serialized differently for RangeIndex (as a
dict, instead of a name referencing the column in the arrow table).
This is actually already being fixed by
https://github.com/apache/arrow/pull/7156 (we have a test for this, but it is
only getting enabled in that PR because it was using a buffer)
> [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and
> use_pandas_metadata=True
> -------------------------------------------------------------------------------------------
>
> Key: ARROW-9040
> URL: https://issues.apache.org/jira/browse/ARROW-9040
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.17.1
> Reporter: cmsxbc
> Priority: Major
>
> Loading a parquet file write by pandas with default index.
> When call _ParquetDatasetV2.read(columns=['column'],
> use_pandas_metadata=True),
> "TypeError: unhashable type 'dict'" were raised from
> {code:java}
> index_columns = set(_get_pandas_index_columns(metadata)){code}
> Is it because of pandas default index?
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)