martgra opened a new issue #11826:
URL: https://github.com/apache/arrow/issues/11826
Hi,
I'm wondering how partitions work in the new Datasets api.
This is part of my code where data is written:
```python
pa.dataset.write_dataset(
table,
output_path,
basename_template=f"chunk_{y}_{{i}}",
format="parquet",
partitioning=["code"],
existing_data_behavior="overwrite_or_ignore",
)
```
However when reading out data again results in:
```python
>>>pd.read_parquet(path, filters=[('code', '=', "1234")])
Trace:
ArrowInvalid: No match for FieldRef.Name(code)
```
Is this expected? That partition columns dissappear from table? I have also
tried directly with pyarrow, and also looking at the table columns "code" is
missing.
FYI:
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]