[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12811: ARROW-16122: [Python] Deprecate no-longer supported keywords in parquet.write_to_dataset

GitBox Thu, 07 Apr 2022 08:46:05 -0700


jorisvandenbossche commented on code in PR #12811:
URL: https://github.com/apache/arrow/pull/12811#discussion_r845291213



##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -1290,7 +1290,7 @@ def _test_write_to_dataset_no_partitions(base_path,
     # Without partitions, append files to root_path
     n = 5
     for i in range(n):
-        pq.write_to_dataset(output_table, base_path,
+        pq.write_to_dataset(output_table, base_path, use_legacy_dataset=True,

Review Comment:
   So the reason that this is otherwise failing, is because with the new 
dataset implementation, we are using a fixed file name (`part-0.parquet`), 
while before we where using a uuid filename. And so therefore, with the 
non-legacy writer, it is each time overwriting the same file inside the loop.
   
   To what extent would this be something that users also could bump into? We 
could in theory use the `basename_template` argument to replicate this "uuid" 
filename behaviour inside `pq.write_to_dataset`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12811: ARROW-16122: [Python] Deprecate no-longer supported keywords in parquet.write_to_dataset

Reply via email to