It looks like you want ParquetFile(...).metadata.metadata
https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pyx#L162 Maybe this could be named better On Wed, Jul 26, 2017 at 9:42 AM, Katelman, Michael <[email protected]> wrote: > The parquet user metadata. I should be able to add it myself as well. > > -Mike > > -----Original Message----- > From: Wes McKinney [mailto:[email protected]] > Sent: Wednesday, July 26, 2017 9:34 > To: [email protected] > Subject: Re: metadata reading > > The Arrow user metadata or the Parquet user metadata? If the latter we may > need to add an accessor property to return the Parquet key-value metadata to > you > > On Wed, Jul 26, 2017 at 7:16 AM, Katelman, Michael > <[email protected]> wrote: >> Thanks, Wes. Does the metadata had through pq.ParquetFile(path).metadata (or >> .schema) include user metadata? I only see num rows, num row groups, column >> names and types. Maybe I'm not looking in the right place. >> >> -Mike >> >> -----Original Message----- >> From: Wes McKinney [mailto:[email protected]] >> Sent: Tuesday, July 25, 2017 21:54 >> To: [email protected] >> Subject: Re: metadata reading >> >> hi Mike, >> >> You can use >> >> import pyarrow.parquet as pq >> pf = pq.ParquetFile(path) >> pf.metadata >> >> or >> >> pf.schema >> >> This does not read the whole file, only the metadata. Note that we have a >> function write_metadata: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache >> _arrow_blob_master_python_pyarrow_parquet.py-23L777&d=DwIFaQ&c=f5Q7ov8 >> zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKm >> n9kvZiXDziBQ&m=p8TXns9StzX8qUpC8khlvaVMtnIEoYBx3x7KepiHPik&s=CDhC_IzUB >> RYUYiD7HcDAz8uifHz4vjusDbw5Uejgeko&e= >> >> It would be nice to have a pq.read_metadata method also. I opened a JIRA >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D1273&d=DwIFaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiXDziBQ&m=p8TXns9StzX8qUpC8khlvaVMtnIEoYBx3x7KepiHPik&s=FUF76j72E5P8GVKAHyGDCniNK6Jmnd8y6HFxUgxAi1I&e= >> since this will lead to a patch in Arrow. >> >> - Wes >> >> On Tue, Jul 25, 2017 at 3:54 PM, Katelman, Michael >> <[email protected]> wrote: >>> Hi, >>> >>> I was wondering if someone could help me with a metadata-related >>> question. Is there anything exposed in pyarrow that would allow me to >>> read parquet metadata without reading the entire file? Currently, I >>> use >>> >>> pyarrow.parquet.read_table(path).schema.metadata >>> >>> to get the metadata, but would like to be able to get at it without reading >>> the entire table. >>> >>> -Mike >>> >>> >>> >>> >>> >>> DISCLAIMER: This e-mail message and any attachments are intended solely for >>> the use of the individual or entity to which it is addressed and may >>> contain information that is confidential or legally privileged. If you are >>> not the intended recipient, you are hereby notified that any dissemination, >>> distribution, copying or other use of this message or its attachments is >>> strictly prohibited. If you have received this message in error, please >>> notify the sender immediately and permanently delete this message and any >>> attachments. >>> >>> >>>
