Great! Thanks, Wes.

-----Original Message-----
From: Wes McKinney [mailto:[email protected]] 
Sent: Wednesday, July 26, 2017 10:00
To: [email protected]
Subject: Re: metadata reading

It looks like you want

ParquetFile(...).metadata.metadata

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_arrow_blob_master_python_pyarrow_-5Fparquet.pyx-23L162&d=DwIFaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiXDziBQ&m=FJ5Y5pL05B3JYPYHhQvUJRa3m61ijxgoA3fN-pUAX1k&s=xGPubXurCMTJygiwbL69XWV6saWsjcnDc2AV0Krn0Ts&e=
 

Maybe this could be named better

On Wed, Jul 26, 2017 at 9:42 AM, Katelman, Michael 
<[email protected]> wrote:
> The parquet user metadata. I should be able to add it myself as well.
>
> -Mike
>
> -----Original Message-----
> From: Wes McKinney [mailto:[email protected]]
> Sent: Wednesday, July 26, 2017 9:34
> To: [email protected]
> Subject: Re: metadata reading
>
> The Arrow user metadata or the Parquet user metadata? If the latter we 
> may need to add an accessor property to return the Parquet key-value 
> metadata to you
>
> On Wed, Jul 26, 2017 at 7:16 AM, Katelman, Michael 
> <[email protected]> wrote:
>> Thanks, Wes. Does the metadata had through pq.ParquetFile(path).metadata (or 
>> .schema) include user metadata? I only see num rows, num row groups, column 
>> names and types. Maybe I'm not looking in the right place.
>>
>> -Mike
>>
>> -----Original Message-----
>> From: Wes McKinney [mailto:[email protected]]
>> Sent: Tuesday, July 25, 2017 21:54
>> To: [email protected]
>> Subject: Re: metadata reading
>>
>> hi Mike,
>>
>> You can use
>>
>> import pyarrow.parquet as pq
>> pf = pq.ParquetFile(path)
>> pf.metadata
>>
>> or
>>
>> pf.schema
>>
>> This does not read the whole file, only the metadata. Note that we have a 
>> function write_metadata:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apach
>> e
>> _arrow_blob_master_python_pyarrow_parquet.py-23L777&d=DwIFaQ&c=f5Q7ov
>> 8 
>> zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcK
>> m 
>> n9kvZiXDziBQ&m=p8TXns9StzX8qUpC8khlvaVMtnIEoYBx3x7KepiHPik&s=CDhC_IzU
>> B RYUYiD7HcDAz8uifHz4vjusDbw5Uejgeko&e=
>>
>> It would be nice to have a pq.read_metadata method also. I opened a JIRA 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D1273&d=DwIFaQ&c=f5Q7ov8zryUUIGT55zpGgw&r=p7uiAfJkXEwbVhZPqB-VxtsgxuGNpO5tGgnMUX3wqrPAIvdxhcKmn9kvZiXDziBQ&m=p8TXns9StzX8qUpC8khlvaVMtnIEoYBx3x7KepiHPik&s=FUF76j72E5P8GVKAHyGDCniNK6Jmnd8y6HFxUgxAi1I&e=
>>   since this will lead to a patch in Arrow.
>>
>> - Wes
>>
>> On Tue, Jul 25, 2017 at 3:54 PM, Katelman, Michael 
>> <[email protected]> wrote:
>>> Hi,
>>>
>>> I was wondering if someone could help me with a metadata-related 
>>> question. Is there anything exposed in pyarrow that would allow me 
>>> to read parquet metadata without reading the entire file? Currently, 
>>> I use
>>>
>>> pyarrow.parquet.read_table(path).schema.metadata
>>>
>>> to get the metadata, but would like to be able to get at it without reading 
>>> the entire table.
>>>
>>> -Mike
>>>
>>>
>>>
>>>
>>>
>>> DISCLAIMER: This e-mail message and any attachments are intended solely for 
>>> the use of the individual or entity to which it is addressed and may 
>>> contain information that is confidential or legally privileged. If you are 
>>> not the intended recipient, you are hereby notified that any dissemination, 
>>> distribution, copying or other use of this message or its attachments is 
>>> strictly prohibited. If you have received this message in error, please 
>>> notify the sender immediately and permanently delete this message and any 
>>> attachments.
>>>
>>>
>>>

Reply via email to