AlenkaF commented on issue #33845:
URL: https://github.com/apache/arrow/issues/33845#issuecomment-1401504187
Your example, with some typo corrections as the code is buggy, works well
for me:
```python
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
df=pd.DataFrame({'a':[1,2],'b':[3,4],'_part_col':[5,6]})
df
# a b _part_col
# 0 1 3 5
# 1 2 4 6
table = pa.Table.from_pandas(df)
table
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
# Try writing with parquet module as in your example
pq.write_to_dataset(table, root_path='example_pq_ds',
use_legacy_dataset=False)
# First read the dataset with dataset module
dataset_pq = ds.dataset('example_pq_ds')
dataset_pq.to_table()
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
# Then read the dataset with the parquet module
pq.read_table('example_pq_ds', use_legacy_dataset=False)
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
pq.read_table('example_pq_ds', use_legacy_dataset=True)
# <stdin>:1: FutureWarning: Passing 'use_legacy_dataset=True' to get the
legacy behaviour is deprecated as of pyarrow 8.0.0, and the legacy
implementation will be removed in a future version.
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
```
It also works using `parquet` and `dataset` modules in other ways:
```python
# Try with writing to single file
# and reading with pq.read_table
pq.write_table(table, 'example.parquet')
pq.read_table('example.parquet')
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
# Try reading a single file with the dataset module
dataset = ds.dataset('example.parquet', format="parquet")
dataset.to_table()
# pyarrow.Table
# a: int64
# b: int64
# _part_col: int64
# ----
# a: [[1,2]]
# b: [[3,4]]
# _part_col: [[5,6]]
```
I am running this on the latest master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]