[
https://issues.apache.org/jira/browse/HIVE-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rich Haase resolved HIVE-11132.
-------------------------------
Resolution: Won't Fix
Assignee: Rich Haase
The interaction between these two parameters is undesirable, but rare enough
that it's probably not worth the effort of fixing. This JIRA can serve as
documentation of the problem for anyone who encounters it in future.
> Queries using join and group by produce incorrect output when
> hive.auto.convert.join=false and hive.optimize.reducededuplication=true
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-11132
> URL: https://issues.apache.org/jira/browse/HIVE-11132
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Rich Haase
> Assignee: Rich Haase
>
> Queries using join and group by produce multiple output rows with the same
> key when hive.auto.convert.join=false and
> hive.optimize.reducededuplication=true. This interaction between
> configuration parameters is unexpected and should be well documented at the
> very least and should likely be considered a bug.
> e.g.
> hive> set hive.auto.convert.join = false;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
> > FROM foo
> > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
> > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
> > JOIN forecast ON (foo.id = forecast.id AND foo.line_id =
> forecast.line_id)
> > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
> > GROUP BY foo.id;
> XYZ 79
> XYZ 74
> XYZ 297
> XYZ 66
> hive> set hive.auto.convert.join = true;
> hive> set hive.optimize.reducededuplication = true;
> hive> SELECT foo.id, count(*) as factor
> > FROM foo
> > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
> > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
> > JOIN forecast ON (foo.id = forecast.id AND foo.line_id =
> forecast.line_id)
> > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
> > GROUP BY foo.id;
> XYZ 516
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)