Martin Thøgersen created ARROW-15494:
----------------------------------------
Summary: [Docs] Clarify {{existing_data_behavior}} docstring
Key: ARROW-15494
URL: https://issues.apache.org/jira/browse/ARROW-15494
Project: Apache Arrow
Issue Type: Improvement
Components: Documentation
Affects Versions: 7.0.1
Reporter: Martin Thøgersen
Clarify wording slightly of \{{pyarrow.dataset.write_dataset()}} parameter
{{existing_data_behavior}}
[https://github.com/apache/arrow/blob/a27c55660e575a3987283d5d9e443642db48f215/python/pyarrow/dataset.py#L812-L827]
Proposed wording:
{noformat}
existing_data_behavior : 'error' | 'overwrite_or_ignore' | \
'delete_matching'
Controls how the dataset will handle data that already exists in
the destination. The default behavior ('error') is to raise an error
if any data exists in the `base_dir` destination.
'overwrite_or_ignore' will ignore any existing data and will
overwrite files with the same name as an output file. Other
existing files will be ignored. This behavior, in combination
with a unique basename_template for each write, will allow for
an append workflow.
'delete_matching' is useful when you are writing a partitioned
dataset. The first time each partition leaf-level directory is
encountered the entire leaf-level directory will be deleted. This
allows you to overwrite old partitions completely.
{noformat}
I.e. clarify that:
- {{error}} applies to the base_dir level.
- {{delete_matching}} applies to the leaf-level directory.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)