hi Mike, You can use
import pyarrow.parquet as pq pf = pq.ParquetFile(path) pf.metadata or pf.schema This does not read the whole file, only the metadata. Note that we have a function write_metadata: https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L777 It would be nice to have a pq.read_metadata method also. I opened a JIRA https://issues.apache.org/jira/browse/ARROW-1273 since this will lead to a patch in Arrow. - Wes On Tue, Jul 25, 2017 at 3:54 PM, Katelman, Michael <[email protected]> wrote: > Hi, > > I was wondering if someone could help me with a metadata-related question. Is > there anything exposed in pyarrow that would allow me to read parquet > metadata without reading the entire file? Currently, I use > > pyarrow.parquet.read_table(path).schema.metadata > > to get the metadata, but would like to be able to get at it without reading > the entire table. > > -Mike > > > > > > DISCLAIMER: This e-mail message and any attachments are intended solely for > the use of the individual or entity to which it is addressed and may contain > information that is confidential or legally privileged. If you are not the > intended recipient, you are hereby notified that any dissemination, > distribution, copying or other use of this message or its attachments is > strictly prohibited. If you have received this message in error, please > notify the sender immediately and permanently delete this message and any > attachments. > > >
