bkietz commented on a change in pull request #8507:
URL: https://github.com/apache/arrow/pull/8507#discussion_r512801346
##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -905,13 +896,12 @@ def test_fragments_parquet_ensure_metadata(tempdir,
open_logging_fs):
assert row_group.num_rows == 2
assert row_group.statistics is not None
- # pickling preserves row group ids but not statistics
+ # pickling preserves row group ids and statistics
Review comment:
The fragment is always reopened currently since all properties of
`ParquetFileFragment` ensure complete metadata. Ideally the metadata would be
serialized while pickling and unpickling would produce a fully loaded parquet
fragment, but this needs to wait for a follow up since there is non trivial
mismatch between how parquet views schemas (depth first flat indexed) an how
arrow views schemas which currently requires an open reader to mediate. Once we
disentangle this we can avoid reopening files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]