[GitHub] [arrow] mirkhosro commented on a diff in pull request #13591: ARROW-17046 [Python] improve documentation of pyarrow.parquet.write_to_dataset function

GitBox Wed, 13 Jul 2022 21:40:59 -0700


mirkhosro commented on code in PR #13591:
URL: https://github.com/apache/arrow/pull/13591#discussion_r920744447



##########
python/pyarrow/parquet/__init__.py:
##########
@@ -3063,16 +3064,19 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         used determined by the number of available CPU cores.
     schema : Schema, optional
     partitioning : Partitioning or list[str], optional
+        (This option is used only when `use_legacy_dataset` is False.)

Review Comment:
   You're right about `use_threads`. However we should be able to pass `schema` 
which gets passed to `ParquetWriter` through kwargs of `write_table` in legacy 
code path. In fact my code relied on that behavior in 7.0.0 as I was specifying 
the schema through that option. But now it's erroring out in 8.0.0.
   I think one fix would be to remove this check and pass `schema` to 
`write_table` (as a keyword argument) which would in turn pass it to 
`ParquetWriter`. I can make that change if you agree that that's the correct 
behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mirkhosro commented on a diff in pull request #13591: ARROW-17046 [Python] improve documentation of pyarrow.parquet.write_to_dataset function

Reply via email to