AlenkaF commented on code in PR #13591:
URL: https://github.com/apache/arrow/pull/13591#discussion_r919713913


##########
python/pyarrow/parquet/__init__.py:
##########
@@ -3063,16 +3064,19 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         used determined by the number of available CPU cores.
     schema : Schema, optional
     partitioning : Partitioning or list[str], optional
+        (This option is used only when `use_legacy_dataset` is False.)

Review Comment:
   I would suggest moving the info in the parentheses to the end of the 
argument description and remove the (). Similar for other arguments.
   
   As information for arguments that are only used in the new dataset 
implementation is added, would it be good to add it also to `schema` and 
`use_threads`? 



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -3049,6 +3049,7 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         Column names by which to partition the dataset.
         Columns are partitioned in the order they are given
     partition_filename_cb : callable,
+        (This option is used only when `use_legacy_dataset` is True.)

Review Comment:
   In the case of `partition_filename_cb`:
    - if `use_legacy_dataset=None` and `partition_filename_cb` is specified by 
the user, the implementation used will be set to legacy implementation, so no 
actions are needed by the user.
    - if `use_legacy_dataset=False`, the user will receive a ValueError 
suggesting to specify `use_legacy_dataset=True` to be able to use 
`partition_filename_cb`.
   
   So in this case I would maybe add a slightly different information . Can be 
to also specify `use_legacy_dataset=True` or that the new dataset 
implementation doesn't support this argument.



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -3114,8 +3114,11 @@ def file_visitor(written_file):
         the entire directory will be deleted.  This allows you to overwrite
         old partitions completely.
     **kwargs : dict,
-        Additional kwargs for write_table function. See docstring for
-        `write_table` or `ParquetWriter` for more information.
+        When `use_legacy_dataset` is False, used as additional kwargs for 
+        `pyarrow.dataset.ParquetFileFormat.make_write_options` function.

Review Comment:
   I think this should be ...
   
   ```suggestion
           `pyarrow.dataset.write_dataset` function.
   ```
   Then something like this can be added: (See docstring for `write_dataset` or 
 `ParquetFileFormat` for more information).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to