westonpace commented on a change in pull request #11632:
URL: https://github.com/apache/arrow/pull/11632#discussion_r744995633
##########
File path: python/pyarrow/dataset.py
##########
@@ -798,6 +799,18 @@ def write_dataset(data, base_dir, basename_template=None,
format=None,
def file_visitor(written_file):
visited_paths.append(written_file.path)
+ existing_data_behavior : 'error' | 'overwrite' | 'delete_matching'
Review comment:
Let's stick with `overwrite_or_ignore`. Should we decide we need to
change at some point down the line it would be a fairly minor change even if we
wanted to keep backwards compatibility with the old style. The R & python
dataset APIs are already pretty different.
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -3381,6 +3382,19 @@ def _filesystemdataset_write(
c_options.partitioning = partitioning.unwrap()
c_options.max_partitions = max_partitions
c_options.basename_template = tobytes(basename_template)
+ if existing_data_behavior == 'error':
+ c_options.existing_data_behavior = ExistingDataBehavior_ERROR
+ elif existing_data_behavior == 'overwrite_or_ignore':
+ c_options.existing_data_behavior =\
+ ExistingDataBehavior_OVERWRITE_OR_IGNORE
+ elif existing_data_behavior == 'delete_matching':
+ c_options.existing_data_behavior = ExistingDataBehavior_DELETE_MATCHING
+ else:
+ raise ValueError(
+ ('existing_data_behavior must be one of error, ',
+ 'overwrite_or_ignore or delete_matching')
Review comment:
Good idea, added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]