[ https://issues.apache.org/jira/browse/HIVE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873349#comment-16873349 ]
Zoltan Haindrich edited comment on HIVE-21407 at 6/26/19 1:54 PM: ------------------------------------------------------------------ The format may store it however it want that value; but, it is Hive's type systems responsibility to handle a "rightly" type constant ; for example a {{Char( n )}} type if that's what it is - you may ask for the expanded or the non-expanded form...but when you do I think you have to consider what contracts parquet is comforming to. To keep this short; I think the following example might help: {code} select 'a' = 'a ', cast('a' as char(3)) = 'a ', cast('a ' as char(3)) = 'a'; {code} Would it be possible that parquet stores 'a ' somehow? because if that's possible then neither 'a' nor 'a ' will match that... was (Author: kgyrtkirk): The format may store it however it want that value; but, it is Hive's type systems responsibility to handle a "rightly" type constant ; for example a Char(n) type if that's what it is - you may ask for the expanded or the non-expanded form...but when you do I think you have to consider what contracts parquet is comforming to. To keep this short; I think the following example might help: {code} select 'a' = 'a ', cast('a' as char(3)) = 'a ', cast('a ' as char(3)) = 'a'; {code} Would it be possible that parquet stores 'a ' somehow? because if that's possible then neither 'a' nor 'a ' will match that... > Parquet predicate pushdown is not working correctly for char column types > ------------------------------------------------------------------------- > > Key: HIVE-21407 > URL: https://issues.apache.org/jira/browse/HIVE-21407 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Marta Kuczora > Assignee: Marta Kuczora > Priority: Major > Attachments: HIVE-21407.2.patch, HIVE-21407.3.patch, HIVE-21407.patch > > > If the 'hive.optimize.index.filter' parameter is false, the filter predicate > is not pushed to parquet, so the filtering only happens within Hive. If the > parameter is true, the filter is pushed to parquet, but for a char type, the > value which is pushed to Parquet will be padded with spaces: > {noformat} > @Override > public void setValue(String val, int len) { > super.setValue(HiveBaseChar.getPaddedValue(val, len), -1); > } > {noformat} > So if we have a char(10) column which contains the value "apple" and the > where condition looks like 'where c='apple'', the value pushed to Paquet will > be 'apple' followed by 5 spaces. But the stored values are not padded, so no > rows will be returned from Parquet. > How to reproduce: > {noformat} > $ create table ppd (c char(10), v varchar(10), i int) stored as parquet; > $ insert into ppd values ('apple', 'bee', 1),('apple', 'tree', 2),('hello', > 'world', 1),('hello','vilag',3); > $ set hive.optimize.ppd.storage=true; > $ set hive.vectorized.execution.enabled=true; > $ set hive.vectorized.execution.enabled=false; > $ set hive.optimize.ppd=true; > $ set hive.optimize.index.filter=true; > $ set hive.parquet.timestamp.skip.conversion=false; > $ select * from ppd where c='apple'; > +--------+--------+--------+ > | ppd.c | ppd.v | ppd.i | > +--------+--------+--------+ > +--------+--------+--------+ > $ set hive.optimize.index.filter=false; or set > hive.optimize.ppd.storage=false; > $ select * from ppd where c='apple'; > +-------------+--------+--------+ > | ppd.c | ppd.v | ppd.i | > +-------------+--------+--------+ > | apple | bee | 1 | > | apple | tree | 2 | > +-------------+--------+--------+ > {noformat} > The issue surfaced after uploading the fix for > [HIVE-21327|https://issues.apache.org/jira/browse/HIVE-21327] was uploaded > upstream. Before the HIVE-21327 fix, setting the parameter > 'hive.parquet.timestamp.skip.conversion' to true in the parquet_ppd_char.q > test hid this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)