jorisvandenbossche commented on issue #11027: URL: https://github.com/apache/arrow/issues/11027#issuecomment-909174498
Ah, ARROW-12644 indeed only implemented the _decoding_ when reading, not the equivalent _encoding_ when writing. But so if we can read such datasets, we should probably also enable to write them? (will open a JIRA about that) @wanx4910 To show that we can read values with encoded `/` (illustrating what @westonpace mentioned above), I created a small dataset with two directories with URL encoded values (using a european date format of 2012/01/01): ``` In [44]: !ls test_decoding.parquet/ 2012%2F01%2F01 2012%2F01%2F02 In [45]: dataset = ds.dataset("test_decoding.parquet/", partitioning=["date"], format="parquet") In [46]: dataset Out[46]: <pyarrow._dataset.FileSystemDataset at 0x7f110c345770> In [47]: dataset.to_table().to_pandas() Out[47]: b date 0 1 2012/01/01 1 2 2012/01/02 ``` So when reading, we can properly decode such values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org