[ https://issues.apache.org/jira/browse/HIVE-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhang Li updated HIVE-19653: ---------------------------- Description: Consider the following query: {code:java} CREATE TABLE T1(a STRING, b STRING, s BIGINT); INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456); SELECT * FROM ( SELECT a, b, sum(s) FROM T1 GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b)) ) t WHERE a IS NOT NULL; {code} When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output: {code:java} NULL NULL 123456 NULL bbbb 123456 aaaa NULL 123456 aaaa bbbb 123456 {code} We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect. When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong. was: Consider the following query: {code:java} CREATE TABLE T1(a STRING, b STRING, s BIGINT); INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456); SELECT * FROM ( SELECT a, b, sum(s) FROM T1 GROUP BY a, b GROUPING SETS ((a), (a, b)) ) t WHERE a IS NOT NULL; {code} When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output: {code:java} NULL NULL 123456 NULL bbbb 123456 aaaa NULL 123456 aaaa bbbb 123456 {code} We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect. When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong. > Incorrect predicate pushdown for groupby with grouping sets > ----------------------------------------------------------- > > Key: HIVE-19653 > URL: https://issues.apache.org/jira/browse/HIVE-19653 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Affects Versions: 3.1.0 > Reporter: Zhang Li > Assignee: Zhang Li > Priority: Major > Fix For: 3.1.0 > > Attachments: HIVE-19653.patch > > > Consider the following query: > {code:java} > CREATE TABLE T1(a STRING, b STRING, s BIGINT); > INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456); > SELECT * FROM ( > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b)) > ) t WHERE a IS NOT NULL; > {code} > When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will > output: > {code:java} > NULL NULL 123456 > NULL bbbb 123456 > aaaa NULL 123456 > aaaa bbbb 123456 > {code} > We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect. > When performing PPD optimization for a GBY operator, we should make sure all > grouping sets contains the processing expr before pushdown. otherwise the > expr value after GBY is changed and the result is wrong. -- This message was sent by Atlassian JIRA (v7.6.3#76005)