[ 
https://issues.apache.org/jira/browse/HIVE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873349#comment-16873349
 ] 

Zoltan Haindrich edited comment on HIVE-21407 at 6/26/19 1:54 PM:
------------------------------------------------------------------

The format may store it however it want that value; but, it is Hive's type 
systems responsibility to handle a "rightly" type constant ; for example a 
{{Char( n )}} type if that's what it is - you may ask for the expanded or the 
non-expanded form...but when you do I think you have to consider what contracts 
parquet is comforming to.

To keep this short; I think the following example might help:
{code}
select 'a' = 'a ', cast('a' as char(3)) = 'a ', cast('a ' as char(3)) = 'a';
{code}

Would it be possible that parquet stores 'a ' somehow? because if that's 
possible then neither 'a' nor 'a  ' will match that...


was (Author: kgyrtkirk):
The format may store it however it want that value; but, it is Hive's type 
systems responsibility to handle a "rightly" type constant ; for example a 
Char(n) type if that's what it is - you may ask for the expanded or the 
non-expanded form...but when you do I think you have to consider what contracts 
parquet is comforming to.

To keep this short; I think the following example might help:
{code}
select 'a' = 'a ', cast('a' as char(3)) = 'a ', cast('a ' as char(3)) = 'a';
{code}

Would it be possible that parquet stores 'a ' somehow? because if that's 
possible then neither 'a' nor 'a  ' will match that...

> Parquet predicate pushdown is not working correctly for char column types
> -------------------------------------------------------------------------
>
>                 Key: HIVE-21407
>                 URL: https://issues.apache.org/jira/browse/HIVE-21407
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>         Attachments: HIVE-21407.2.patch, HIVE-21407.3.patch, HIVE-21407.patch
>
>
> If the 'hive.optimize.index.filter' parameter is false, the filter predicate 
> is not pushed to parquet, so the filtering only happens within Hive. If the 
> parameter is true, the filter is pushed to parquet, but for a char type, the 
> value which is pushed to Parquet will be padded with spaces:
> {noformat}
>   @Override
>   public void setValue(String val, int len) {
>     super.setValue(HiveBaseChar.getPaddedValue(val, len), -1);
>   }
> {noformat} 
> So if we have a char(10) column which contains the value "apple" and the 
> where condition looks like 'where c='apple'', the value pushed to Paquet will 
> be 'apple' followed by 5 spaces. But the stored values are not padded, so no 
> rows will be returned from Parquet.
> How to reproduce:
> {noformat}
> $ create table ppd (c char(10), v varchar(10), i int) stored as parquet;
> $ insert into ppd values ('apple', 'bee', 1),('apple', 'tree', 2),('hello', 
> 'world', 1),('hello','vilag',3);
> $ set hive.optimize.ppd.storage=true;
> $ set hive.vectorized.execution.enabled=true;
> $ set hive.vectorized.execution.enabled=false;
> $ set hive.optimize.ppd=true;
> $ set hive.optimize.index.filter=true;
> $ set hive.parquet.timestamp.skip.conversion=false;
> $ select * from ppd where c='apple';
> +--------+--------+--------+
> | ppd.c  | ppd.v  | ppd.i  |
> +--------+--------+--------+
> +--------+--------+--------+
> $ set hive.optimize.index.filter=false; or set 
> hive.optimize.ppd.storage=false;
> $ select * from ppd where c='apple';
> +-------------+--------+--------+
> |    ppd.c    | ppd.v  | ppd.i  |
> +-------------+--------+--------+
> | apple       | bee    | 1      |
> | apple       | tree   | 2      |
> +-------------+--------+--------+
> {noformat}
> The issue surfaced after uploading the fix for 
> [HIVE-21327|https://issues.apache.org/jira/browse/HIVE-21327] was uploaded 
> upstream. Before the HIVE-21327 fix, setting the parameter 
> 'hive.parquet.timestamp.skip.conversion' to true in the parquet_ppd_char.q 
> test hid this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to