Hi Brandon, You could probably make a copy of the thrift definition and keep only the fields you need. If you use the generated classes to read the metadata, thrift will skip all the other fields Julien
On Jul 26, 2014, at 12:16 AM, Brandon Amos wrote: > Hi Parquet team, > > I apologize for the simple question, but I'm using Parquet on HDFS in > a Scala/Spark application and am having trouble efficiently > obtaining the number of rows in my Parquet data stores without > loading and counting. > > The README at https://github.com/apache/incubator-parquet-format > has great information about the format of the metadata, > and I want to extract the `num_rows` field from the > `FileMetaData` Thrift object. > However, the `_metadata` file contained in Parquet databases > contains many Thrift objects and other information > in addition to the `FileMetaData` object that I want to extract. > > Can anybody give recommendations on how I can most efficiently > extract the `num_rows` field? > > Thanks, > Brandon.
