[ 
https://issues.apache.org/jira/browse/HIVE-11312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11312:
-----------------------------------------
    Attachment: HIVE-11312.2.patch

We don't need a CHAR type in sargs. Instead we need to convert the constant 
literal to column type. For charColumn = "1", the literal "1" should be padded 
to the length specified in charColumn type info. ORC stores char columns  as is 
without stripping off the trailing white spaces. The column stats will also 
have spaces at the end. If we map the constant literal to the column type then 
we can get away with not having CHAR type in sargs.

[~gopalv]/[~owen.omalley] Can someone take a look at the patch?

> ORC format: where clause with CHAR data type not returning any rows
> -------------------------------------------------------------------
>
>                 Key: HIVE-11312
>                 URL: https://issues.apache.org/jira/browse/HIVE-11312
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.2.0, 1.2.1
>            Reporter: Thomas Friedrich
>            Assignee: Thomas Friedrich
>              Labels: orc
>         Attachments: HIVE-11312.1.patch, HIVE-11312.2.patch
>
>
> Test case:
> Setup: 
> create table orc_test( col1 string, col2 char(10)) stored as orc 
> tblproperties ("orc.compress"="NONE");
> insert into orc_test values ('val1', '1');
> Query:
> select * from orc_test where col2='1'; 
> Query returns no row.
> Problem is introduced with HIVE-10286, class RecordReaderImpl.java, method 
> evaluatePredicateRange.
> Old code:
> - Object baseObj = predicate.getLiteral(PredicateLeaf.FileFormat.ORC);
> - Object minValue = getConvertedStatsObj(min, baseObj);
> - Object maxValue = getConvertedStatsObj(max, baseObj);
> - Object predObj = getBaseObjectForComparison(baseObj, minValue);
> New code:
> + Object baseObj = predicate.getLiteral();
> + Object minValue = getBaseObjectForComparison(predicate.getType(), min);
> + Object maxValue = getBaseObjectForComparison(predicate.getType(), max);
> + Object predObj = getBaseObjectForComparison(predicate.getType(), baseObj);
> The values for min and max are of type String which contain as many 
> characters as the CHAR column indicated. For example if the type is CHAR(10), 
> and the row has value 1, the value of String min is "1         ";
> Before Hive 1.2, the method getConvertedStatsObj would call 
> StringUtils.stripEnd(statsObj.toString(), null); which would remove the 
> trailing spaces from min and max. Later in the compareToRange method, it was 
> able to compare "1" with "1".
> In Hive 1.2 with the use getBaseObjectForComparison method, it simply returns 
> obj.String if the data type is String, which means minValue and maxValue are 
> still "1         ".
> As a result, the compareToRange method will return a wrong value 
> ("1".compareTo("1         ")  -9 instead of 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to