[
https://issues.apache.org/jira/browse/ARROW-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407565#comment-17407565
]
Weston Pace commented on ARROW-13813:
-------------------------------------
I think my only concern is that this is something the user should be able to
easily do themselves using the compute stuff. They could use a scanner to read
in their data, project the offending column to an encoding kernel, and then
partition on the projected column.
However, since we already have segment encoding in partition objects it seems
straightforward enough to provide. It might be a good project to pair with
ARROW-11378 if someone is looking for some good beginner C++ tasks.
> [C++][Dataset] Support URL encoding of partition field values for the file
> path
> -------------------------------------------------------------------------------
>
> Key: ARROW-13813
> URL: https://issues.apache.org/jira/browse/ARROW-13813
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Joris Van den Bossche
> Priority: Major
> Labels: dataset
>
> In ARROW-12644, we added support for _decoding_ the file paths when reading
> datasets. So a valid follow-up question: should we also support _encoding_
> when writing datasets?
> (see also https://github.com/apache/arrow/issues/11027)
> Rereading ARROW-12644, there wasn't yet much discussion on that aspect.
> cc [~westonpace] [~lidavidm]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)