[ https://issues.apache.org/jira/browse/ARROW-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123905#comment-17123905 ]
Tom Augspurger commented on ARROW-7782: --------------------------------------- Joris, was this fix included in 0.17.1? Or is it just for 1.0? > [Python] Losing index information when using write_to_dataset with > partition_cols > --------------------------------------------------------------------------------- > > Key: ARROW-7782 > URL: https://issues.apache.org/jira/browse/ARROW-7782 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: pyarrow==0.15.1 > Reporter: Ludwik Bielczynski > Assignee: Joris Van den Bossche > Priority: Major > Fix For: 1.0.0 > > > One cannot save the index when using {{pyarrow.parquet.write_to_dataset()}} > with given partition_cols arguments. Here I have created a minimal example > which shows the issue: > {code:java} > > from pathlib import Path > import pandas as pd > from pyarrow import Table > from pyarrow.parquet import write_to_dataset, read_table > path = Path('/home/user/trials') > file_name = 'local_database.parquet' > df = pd.DataFrame({"A": [1, 2, 3], "B": ['a', 'a', 'b']}, > index=pd.Index(['a', 'b', 'c'], > name='idx')) > table = Table.from_pandas(df) > write_to_dataset(table, > str(path / file_name), > partition_cols=['B'] > ) > df_read = read_table(str(path / file_name)) > df_read.to_pandas() > {code} > > The issue is rather important for pandas and dask users. -- This message was sent by Atlassian Jira (v8.3.4#803005)