[ https://issues.apache.org/jira/browse/HIVE-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201521#comment-17201521 ]
Jiehong Lian edited comment on HIVE-12778 at 9/24/20, 1:32 PM: --------------------------------------------------------------- It's a bug when the having clause contains the aggregator with distinct hint, while the aggregator does not in the select clause. such as: {code:java} SELECT key FROM src GROUP BY key HAVING COUNT(value) >= 4 and count(distinct value) > 1 ;{code} I have a patch fix the bug. {code:java} // patch.diff diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java index 35d25138fe..4450bc2d47 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java @@ -214,6 +214,14 @@ public void clearDistinctFuncExprsForClause(String clause) { } }+ public void addDistinctFuncExprsForClause(String clause, List<ASTNode> dists) { + if (destToDistinctFuncExprs.containsKey(clause)) { + destToDistinctFuncExprs.get(clause).addAll(dists); + } else { + destToDistinctFuncExprs.put(clause, dists); + } + } + public void setDistinctFuncExprsForClause(String clause, List<ASTNode> ast) { destToDistinctFuncExprs.put(clause, ast); } diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java index e05474c2de..ec2fd8df7a 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java @@ -1408,8 +1408,10 @@ public boolean doPhase1(ASTNode ast, QB qb, Phase1Ctx ctx_1, PlannerContext plan case HiveParser.TOK_HAVING: qbp.setHavingExprForClause(ctx_1.dest, ast); - qbp.addAggregationExprsForClause(ctx_1.dest, - doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest)); + LinkedHashMap<String, ASTNode> aggsInHaving = + doPhase1GetAggregationsFromSelect(ast, qb, ctx_1.dest); + qbp.addAggregationExprsForClause(ctx_1.dest, aggsInHaving); + qbp.addDistinctFuncExprsForClause(ctx_1.dest, doPhase1GetDistinctFuncExprs(aggsInHaving)); break; case HiveParser.KW_WINDOW: {code} was (Author: dindin5258): It's a bug when the having clause contains the aggregator with distinct hint, while the aggregator does not in the select clause. such as: {code:java} // code placeholder SELECT key FROM src GROUP BY key HAVING COUNT(value) >= 4 and count(distinct value) > 1 ;{code} I have a patch fix the bug. > Having with count distinct doesn't work for special combination > --------------------------------------------------------------- > > Key: HIVE-12778 > URL: https://issues.apache.org/jira/browse/HIVE-12778 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.0, 1.2.1 > Reporter: Peter Brejcak > Priority: Major > > There is problem for combination of count(distinct ) in having clause without > count(distinct ) in select clause. > First case returns error *FAILED: SemanticException [Error 10002]: Line > Invalid column reference* (unexpected) > If I add count(distinct ) to select clause result is ok (expected). > Please run code to see it. > Steps to reproduce: > {code} > create table table_subquery_having_problem (id int, value int); > insert into table table_subquery_having_problem values (1,1); > insert into table table_subquery_having_problem values (1,2); > insert into table table_subquery_having_problem values (1,3); > insert into table table_subquery_having_problem values (1,4); > insert into table table_subquery_having_problem values (1,5); > insert into table table_subquery_having_problem values (1,6); > insert into table table_subquery_having_problem values (1,7); > insert into table table_subquery_having_problem values (1,8); > insert into table table_subquery_having_problem values (1,9); > select x.id from table_subquery_having_problem x > group by x.id > having count(distinct x.value)>1; -- result is ERROR > select x.id, count(distinct x.value) from table_subquery_having_problem x > group by x.id > having count(distinct x.value)>1; --result is OK > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)