[ 
https://issues.apache.org/jira/browse/HIVE-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase resolved HIVE-11132.
-------------------------------
    Resolution: Won't Fix
      Assignee: Rich Haase

The interaction between these two parameters is undesirable, but rare enough 
that it's probably not worth the effort of fixing.  This JIRA can serve as 
documentation of the problem for anyone who encounters it in future.

> Queries using join and group by produce incorrect output when 
> hive.auto.convert.join=false and hive.optimize.reducededuplication=true
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11132
>                 URL: https://issues.apache.org/jira/browse/HIVE-11132
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Rich Haase
>            Assignee: Rich Haase
>
> Queries using join and group by produce multiple output rows with the same 
> key when hive.auto.convert.join=false and 
> hive.optimize.reducededuplication=true.  This interaction between 
> configuration parameters is unexpected and should be well documented at the 
> very least and should likely be considered a bug.
> e.g. 
> hive> set hive.auto.convert.join = false;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = 
> forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         79
> XYZ           74
> XYZ           297
> XYZ           66
> hive> set hive.auto.convert.join = true;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
>     > FROM foo
>     > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
>     > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
>     > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = 
> forecast.line_id)
>     > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
>     > GROUP BY foo.id;
> XYZ         516



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to