amoeba commented on issue #43303:
URL: https://github.com/apache/arrow/issues/43303#issuecomment-2237103075
Hi @prayaggordy, you're right that when a dataset is written with
partitioning, the partition fields aren't stored in the files.
Arrow's partitioning approach does auto detection like other systems but
allows you to provide a schema as an alternative which I think should get you
what you want:
```r
> my_schema <- schema(field("cyl_ch", string()))
> open_dataset("output/partition_cyl_ch", partitioning = my_schema)
FileSystemDataset with 3 Parquet files
13 columns
mpg: double
cyl: double
disp: double
hp: double
drat: double
wt: double
qsec: double
vs: double
am: double
gear: double
carb: double
gear_ch: string
cyl_ch: string
```
Would this work for your use case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]