jorisvandenbossche commented on code in PR #12811:
URL: https://github.com/apache/arrow/pull/12811#discussion_r845291213
##########
python/pyarrow/tests/parquet/test_dataset.py:
##########
@@ -1290,7 +1290,7 @@ def _test_write_to_dataset_no_partitions(base_path,
# Without partitions, append files to root_path
n = 5
for i in range(n):
- pq.write_to_dataset(output_table, base_path,
+ pq.write_to_dataset(output_table, base_path, use_legacy_dataset=True,
Review Comment:
So the reason that this is otherwise failing, is because with the new
dataset implementation, we are using a fixed file name (`part-0.parquet`),
while before we where using a uuid filename. And so therefore, with the
non-legacy writer, it is each time overwriting the same file inside the loop.
To what extent would this be something that users also could bump into? We
could in theory use the `basename_template` argument to replicate this "uuid"
filename behaviour inside `pq.write_to_dataset`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]