[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

Ashutosh Chauhan (JIRA) Sat, 17 Oct 2015 10:49:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962009#comment-14962009
 ]


Ashutosh Chauhan commented on HIVE-11735:
-----------------------------------------

I think problem here stems from 
{code}
aggregations.put(expressionTree.toStringTree().toLowerCase(), expressionTree);
{code}

I think for your particular query if you remove {{toLowerCase()}} would solve 
your problem. Do you really need other changes for column aliases and such in 
RR?

Intent for this map is to detect duplicate functions in aggregations, so that 
we are not computing them twice. However, this is blindly doing 
{{toLoweCase()}} on full expression Tree, ignoring the fact that there might be 
constant literals in there. There are two possible solutions here : 

* Eliminate this logic altogether from this phase. Don't bother about 
duplicates in phase 1 analysis. Instead write a rule either on Calcite operator 
tree or Hive operator tree which walks on expressions and detects duplicates 
and fixes up operator tree to refer to 1 expression tree.
* Write a utility function which takes expression tree as an argument and 
returns lower case version of its string tree, while leaving constant string 
literals in original case. Then use this string representation as a key in that 
map.

IMHO, Option 1 is a cleaner approach. However, that might be a big change 
touching various pieces in planning.
Option 2 is much more local and contained change, but kinda inelegant.

cc: [~jpullokkaran] if he has other ideas. 

> Different results when multiple if() functions are used 
> --------------------------------------------------------
>
>                 Key: HIVE-11735
>                 URL: https://issues.apache.org/jira/browse/HIVE-11735
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 1.0.0, 1.1.1, 1.2.1
>            Reporter: Chetna Chaudhari
>            Assignee: Chetna Chaudhari
>         Attachments: HIVE-11735.patch
>
>
> Hive if() udf is returns different results when string equality is used as 
> condition, with case change. 
> Observation:
>    1) if( name = 'chetna' , 3, 4) and if( name = 'Chetna', 3, 4) both are 
> treated as equal.
>    2) The rightmost udf result is pushed to predicates on left side. Leading 
> to same result for both the udfs.
> How to reproduce the issue:
> 1) CREATE TABLE `sample`(
>   `name` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1425075745');
> 2) insert into table sample values ('chetna');
> 3) select min(if(name = 'chetna', 4, 3)) , min(if(name='Chetna', 4, 3))  from 
> sample; 
>     This will give result : 
>     3    3
>     Expected result:
>     4    3
> 4) select min(if(name = 'Chetna', 4, 3)) , min(if(name='chetna', 4, 3))  from 
> sample; 
>     This will give result 
>     4    4
>     Expected result:
>     3    4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11735) Different results when multiple if() functions are used

Reply via email to