Hi Hyukjin,

I think the code you're looking for is created by parquet-generator so we have one specific to each primitive type:


https://github.com/apache/parquet-mr/blob/master/parquet-generator/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java

rb

On 09/16/2015 06:57 PM, Hyukjin Kwon wrote:
Hi all,

I am pretty new to Parquet and trying to learn Parquet structure.

I assume that min, max and etc information has been stored for both
ColumnMetaData and also DataPageHeader since 1.6.0 (
https://github.com/Parquet/parquet-mr/pull/338)

I see the statistics in ColumnMetaData is used to filter blocks (or row
groups) as filter2 at RowGroupFilter by calling canDrop().

I though the statistics in DataPageHeader is used to not to read a page by
reading the statistics.
However, my question is, I could not find where to use statistics in
DataPageHeader for filter1 and also filter2.
​

Could you give me some comments on this please?



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to