Zoltan Haindrich created HIVE-22363: ---------------------------------------
Summary: ReduceDeduplication may leave an invalid GroupByOperator behind in some cases Key: HIVE-22363 URL: https://issues.apache.org/jira/browse/HIVE-22363 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 3.1.2 Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich since HIVE-11387 reducededup may traverse {{GroupByOperators}} [as well|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L244] But the removal logic only removes the first parent; so if there is some other operator (a FIL in this case) between the sink and the gby - the removal may not happen [here|https://github.com/apache/hive/blob/c6626edb65c2cd00576647e54db1995628fe64da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java#L458] {code} set hive.cbo.enable=false; drop table if exists xl1; create table xl1 as select '1' as mdl_yr_desc, 2 as seq_no,'3' as opt_desc1,4 as opt_desc,1 as row_num; explain select trim(base.mdl_yr_desc) mdl_yr_desc, trim(base.opt_desc) opt_desc from ( SELECT trim(mdl_yr_desc) mdl_yr_desc, concat_ws(' ', collect_set(trim(opt_desc1))) AS opt_desc from ( select t14304.* from ( select * from xl1 ) t14304 where row_num = 1 order by trim(mdl_yr_desc), cast(seq_no as int) asc ) x group by trim(mdl_yr_desc) ) base inner join ( select 1 as v ) dedup on trim(base.mdl_yr_desc) != dedup.v group by trim(base.mdl_yr_desc), trim(base.opt_desc) ; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)