Dong Chen created PARQUET-281:
---------------------------------
Summary: Statistic and Filter need a mechanism to get customized
comparator from high layer user
Key: PARQUET-281
URL: https://issues.apache.org/jira/browse/PARQUET-281
Project: Parquet
Issue Type: Improvement
Reporter: Dong Chen
Assignee: Dong Chen
As discussed in HIVE-10254, we might need a customized comparator from high
layer user for generating statistic when writing and applying filter when
reading.
The problem is that (use Decimal type in Hive as an example):
Decimal in Hive is mapped to Binary in Parquet. When using predicate and
statistic to filter values, comparing Binary values in Parquet cannot reflect
the correct relationship of Decimal values in Hive. This type mapping causes 2
problems:
1. When writing Decimal column, Binary.compareTo() is used to judge and set the
column statistic (min, max). The generated statistic value is not correct from
a Decimal perspective.
2. When reading with Predicate (also Filter), in which the expected Decimal
value is converted to Binary type, Binary.compareTo() is used to compare the
expected value and column statistic value. They are Binary perspective, and
also the result is not right.
We could add an interface for customized comparator, and high level user like
Hive provides the comparator to Parquet, since Hive knows how to decode the
binary to Decimal and compare. Then Parquet could switch between customized and
original comparison method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)