okumin commented on code in PR #6128:
URL: https://github.com/apache/hive/pull/6128#discussion_r2431314962
##########
ql/src/test/queries/clientpositive/groupingset_optimize_hive_28489.q:
##########
@@ -1,6 +1,22 @@
-- SORT_QUERY_RESULTS
create table grp_set_test (key string, value string, col0 int, col1 int, col2
int, col3 int);
+
+-- UNION case, can't be optimized
+set hive.optimize.grouping.set.threshold=1;
+with sub_qr as (select col2 from grp_set_test)
+select grpBy_col, sum(col2)
+from
+( select 'abc' as grpBy_col, col2 from sub_qr union all select 'def' as
grpBy_col, col2 from sub_qr) x
+group by grpBy_col with rollup;
Review Comment:
I confirmed the master branch definitely throws a NPE
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupingSetOptimizer.java:
##########
@@ -227,7 +227,8 @@ private String selectPartitionColumn(GroupByOperator gby,
Operator<?> parentOp)
String partitionCol = null;
for (ColStatistics col: columnStatistics) {
String colName = col.getColumnName();
- if (parentOp.getColumnExprMap().containsKey(colName) &&
candidates.contains(colName)) {
+ if (null != parentOp.getColumnExprMap() &&
parentOp.getColumnExprMap().containsKey(colName) &&
+ candidates.contains(colName)) {
Review Comment:
Confidence = 20%. In my very rough feeling, the safest approach is to
reject UNION + GROUPING SETS here. I'm still recalling the behavior of this
optimizer
https://github.com/apache/hive/blob/5050529286c7ae131af0602f22e75c2ab72319fe/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupingSetOptimizer.java#L143-L182
cc: @ngsg
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]