jorisvandenbossche commented on code in PR #14574:
URL: https://github.com/apache/arrow/pull/14574#discussion_r1019985446
##########
python/pyarrow/parquet/core.py:
##########
@@ -3423,16 +3427,22 @@ def write_metadata(schema, where,
metadata_collector=None, **kwargs):
... table.schema, 'dataset_metadata/_metadata',
... metadata_collector=metadata_collector)
"""
- writer = ParquetWriter(where, schema, **kwargs)
+ filesystem, where = _resolve_filesystem_and_path(where, filesystem)
+
+ writer = ParquetWriter(where, schema, filesystem, **kwargs)
writer.close()
if metadata_collector is not None:
# ParquetWriter doesn't expose the metadata until it's written. Write
# it and read it again.
- metadata = read_metadata(where)
+ metadata = read_metadata(where, filesystem=filesystem)
for m in metadata_collector:
metadata.append_row_groups(m)
- metadata.write_metadata_file(where)
+ file_ctx = nullcontext()
+ if filesystem is not None:
+ file_ctx = where = filesystem.open_output_stream(where)
+ with file_ctx:
+ metadata.write_metadata_file(where)
Review Comment:
Actually, one more thing that I realize now: with using
`_resolve_filesystem_and_path`, the only case when `filesystem` will still be
`None`, is when `where` is a file-like object. But in that case, I am not sure
the code to handle `metadata_collector` would work anyway? (like writing,
reading and writing again to the same file like object, will that work?) It
might be good to test that, so we either have coverage for it (if it works), or
either we can simplify the above code (since we can assume `filesystem` to be
always None)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]