Joris Van den Bossche created ARROW-10695:
---------------------------------------------
Summary: [C++][Dataset] Allow to use a UUID in the
basename_template when writing a dataset
Key: ARROW-10695
URL: https://issues.apache.org/jira/browse/ARROW-10695
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
Currently we allow the user to specify a {{basename_template}}, and this can
include a {{"\{i\}"}} part to replace it with an automatically incremented
integer (so each generated file written to a single partition is unique):
https://github.com/apache/arrow/blob/master/python/pyarrow/dataset.py#L713-L717
It _might_ be useful to also have the ability to use a UUID, to ensure the file
is unique in general (not only for a single write) and to mimic the behaviour
of the old {{write_to_dataset}} implementation.
For example, we could look for a {{"\{uuid\}"}} in the template string, and if
present replace it for each file with a new UUID.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)