[ 
https://issues.apache.org/jira/browse/HIVE-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539434#comment-14539434
 ] 

Dong Chen commented on HIVE-10254:
----------------------------------

After investigating this, I found we might need some changes on Parquet side.

*Problem:*
Decimal in Hive is mapped to {{Binary}} in Parquet. When using predicate and 
statistic to filter values, comparing Binary values in Parquet cannot reflect 
the correct relationship of Decimal values in Hive. This type mapping causes 2 
problems:
1. When writing Decimal column, {{Binary.compareTo()}} is used to judge and set 
the column statistic (min, max). The generated statistic value is not correct 
from a Decimal perspective.
2. When reading with Predicate (also Filter), in which the expected Decimal 
value is converted to Binary type, {{Binary.compareTo()}} is used to compare 
the expected value and column statistic value. They are Binary perspective, and 
also the result is not right.

*An idea:*
I was thinking whether we could add a customized comparator as an attribute in 
{{Binary}} class, and high level user like Hive provides the comparator, since 
Hive knows how to decode the binary to Decimal and compare. Then 
{{Binary.compareTo()}} could be changed to switch between customized and 
original comparison method.

Not sure this solution is ok. It has to change Parquet API. 

Any thoughts? Other ideas?



> Parquet PPD support DECIMAL
> ---------------------------
>
>                 Key: HIVE-10254
>                 URL: https://issues.apache.org/jira/browse/HIVE-10254
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Dong Chen
>            Assignee: Dong Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to