jorisvandenbossche commented on a change in pull request #10628:
URL: https://github.com/apache/arrow/pull/10628#discussion_r666787458
##########
File path: python/pyarrow/dataset.py
##########
@@ -731,6 +731,24 @@ def write_dataset(data, base_dir, basename_template=None,
format=None,
(e.g. S3)
max_partitions : int, default 1024
Maximum number of partitions any batch may be written into.
+ file_visitor : Function
+ If set, this function will be called with a WrittenFile instance
+ for each file created during the call. This object will have both
+ a path attribute and a metadata attribute.
+
+ The path attribute will a str containing the absolute path to
+ the created file.
+
+ The metadata attribute will be the parquet metadata of the file.
+ This metadata will have the file path attribute set and can be used
+ to build a _metadata file. The metadata attribute will be None if
+ the format is not parquet.
+
+ # Example visitor which simple collects the filenames created
+ visited_paths = []
+
+ def file_visitor(written_file):
+ visited_paths.append(written_file.path)
Review comment:
```suggestion
Example visitor which simple collects the filenames created::
visited_paths = []
def file_visitor(written_file):
visited_paths.append(written_file.path)
```
(then it will render in a code block in the online docs)
##########
File path: python/pyarrow/dataset.py
##########
@@ -731,6 +731,24 @@ def write_dataset(data, base_dir, basename_template=None,
format=None,
(e.g. S3)
max_partitions : int, default 1024
Maximum number of partitions any batch may be written into.
+ file_visitor : Function
+ If set, this function will be called with a WrittenFile instance
+ for each file created during the call. This object will have both
+ a path attribute and a metadata attribute.
+
+ The path attribute will a str containing the absolute path to
Review comment:
Based on a quick test, it are not always absolute paths, but it depends
on what has been passed as base_dir for the written dataset (whether that's an
absolute or relative path).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]