Hi Brandon,
You could probably make a copy of the thrift definition and keep only the 
fields you need.
If you use the generated classes to read the metadata, thrift will skip all the 
other fields
Julien

On Jul 26, 2014, at 12:16 AM, Brandon Amos wrote:

> Hi Parquet team,
> 
> I apologize for the simple question, but I'm using Parquet on HDFS in
> a Scala/Spark application and am having trouble efficiently
> obtaining the number of rows in my Parquet data stores without
> loading and counting.
> 
> The README at https://github.com/apache/incubator-parquet-format
> has great information about the format of the metadata,
> and I want to extract the `num_rows` field from the
> `FileMetaData` Thrift object.
> However, the `_metadata` file contained in Parquet databases
> contains many Thrift objects and other information
> in addition to the `FileMetaData` object that I want to extract.
> 
> Can anybody give recommendations on how I can most efficiently
> extract the `num_rows` field?
> 
> Thanks,
> Brandon.

Reply via email to