Qifan Chen created IMPALA-10494:
-----------------------------------
Summary: Making use of the min/max column stats to improve min/max
filters
Key: IMPALA-10494
URL: https://issues.apache.org/jira/browse/IMPALA-10494
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Qifan Chen
HMS (hive metastore) API offers means to store the minimal and maximal value
per column
(https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/ColumnStatisticsData.html).
For example, such stats for an integer column can be captured via a
LongColumnStatsData object
(https://hive.apache.org/javadocs/r3.0.0/api/org/apache/hadoop/hive/metastore/api/LongColumnStatsData.html).
It is desirable to use the min and max stats per column to help the formation
of useful min/max filters that can help reduce the data scanned for Parquet
tables.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]