quazzuk opened a new issue #8420:
URL: https://github.com/apache/arrow/issues/8420
```
import os
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.DataFrame(dict(symbol=["A", "B", "C", "D"], year=[2017, 2018, 2019,
2020], close=np.arange(4)))
root_path = "test"
os.makedirs(root_path, exist_ok=True)
dataset = ds.dataset(root_path, format="parquet", partitioning="hive")
table1 = pa.Table.from_pandas(df)
print(f"\nbefore:\n{table.schema.to_string(show_field_metadata=False)}")
pq.write_to_dataset(table, root_path=root_path, partition_cols=["symbol",
"year"])
table2 = dataset.to_table()
print(f"\nafter:\n{table2.schema.to_string(show_field_metadata=False)}")
```
before:
symbol: string
year: int64
close: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' +
582
after:
close: int64
symbol: string
year: int32
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [{"name": null, "field_n' +
300
i.e. column ordering and types. I suspect this might be due to partitioning.
Should I be storing additional metadata and using it when subsequently
retrieving?
Thanks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]