[ 
https://issues.apache.org/jira/browse/PARQUET-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574831#comment-14574831
 ] 

Ryan Blue commented on PARQUET-281:
-----------------------------------

bq. In order to transit the comparator from Hive to Parquet . . .

There should be no need to pass a comparator between Hive and Parquet. This 
would be completely inside of Parquet because Parquet defines the types and 
implements the predicates. That means that Parquet should be able to have a 
custom comparator for any type, determined by its logical type. For this, I 
would add a `getComparator` method to the `Type`, but I'd like to hear what 
[~alexlevenson]'s opinion is.

For example, UINT32 will need a custom comparator that sorts negative numbers 
after positive ones because the sign bit isn't sign, it is data.

When you can get a `Comparator` from the `Type`, then you should be passing 
only the type and getting a comparator when it is needed.

> Statistic and Filter need a mechanism to get customized comparator from high 
> layer user
> ---------------------------------------------------------------------------------------
>
>                 Key: PARQUET-281
>                 URL: https://issues.apache.org/jira/browse/PARQUET-281
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Dong Chen
>            Assignee: Dong Chen
>
> As discussed in HIVE-10254, we might need a customized comparator from high 
> layer user for generating statistic when writing and applying filter when 
> reading. 
> The problem is that (use Decimal type in Hive as an example):
> Decimal in Hive is mapped to Binary in Parquet. When using predicate and 
> statistic to filter values, comparing Binary values in Parquet cannot reflect 
> the correct relationship of Decimal values in Hive. This type mapping causes 
> 2 problems:
> 1. When writing Decimal column, Binary.compareTo() is used to judge and set 
> the column statistic (min, max). The generated statistic value is not correct 
> from a Decimal perspective.
> 2. When reading with Predicate (also Filter), in which the expected Decimal 
> value is converted to Binary type, Binary.compareTo() is used to compare the 
> expected value and column statistic value. They are Binary perspective, and 
> also the result is not right.
> We could add an interface for customized comparator, and high level user like 
> Hive provides the comparator to Parquet, since Hive knows how to decode the 
> binary to Decimal and compare. Then Parquet could switch between customized 
> and original comparison method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to