[ https://issues.apache.org/jira/browse/ARROW-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-8733: ----------------------------------------- Fix Version/s: (was: 1.0.0) > [C++][Dataset][Python] ParquetFileFragment should provide access to parquet > FileMetadata > ---------------------------------------------------------------------------------------- > > Key: ARROW-8733 > URL: https://issues.apache.org/jira/browse/ARROW-8733 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Reporter: Joris Van den Bossche > Priority: Major > Labels: dataset > > Related to ARROW-8062 (as there we will also need a way to expose the global > FileMetadata). But independently, it would be useful to get access to the > FileMetadata on each {{ParquetFileFragment}} (eg to get access to the > statistics). > This would be relatively simple to code on the Python/R side, since we have > access to the file path, and could read the metadata from the file backing > the fragment, and return this as a FileMetadata object. > I am wondering if we want to integrate this with ARROW-8062, since when the > fragments were created from a {{_metadata}} file, a > {{ParquetFileFragment.metadata}} attribute would not need to read it from the > parquet file in this case, but from the global metadata (at least for eg the > row group data). > Another question: what for a ParquetFileFragment that maps to a single row > group? -- This message was sent by Atlassian Jira (v8.3.4#803005)