[ https://issues.apache.org/jira/browse/HIVE-14630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei Yan reassigned HIVE-14630: ------------------------------ Assignee: Wei Yan > Enable PPD for AND conditions when CBO is disabled > -------------------------------------------------- > > Key: HIVE-14630 > URL: https://issues.apache.org/jira/browse/HIVE-14630 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Affects Versions: 2.2.0 > Reporter: Chao Sun > Assignee: Wei Yan > > Currently the PPD optimization seems not be able to handle AND conditions > very well, when CBO is not used. To illustrate with a example: > Table a: > || col || type || part_col? || > | id | int | no | > | datestr | string | yes | > Table b: > || col || type || part_col? || > | id | int | no | > And the following query: > {code} > SELECT a.id FROM a JOIN b > ON a.id = b.id > WHERE a.datestr >= '2016-08-20' > AND rand() > 0.5 > {code} > For this query, the plan looks like the following: > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: a > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Reduce Output Operator > key expressions: id (type: bigint) > sort order: + > Map-reduce partition columns: id (type: bigint) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > value expressions: datestr (type: string) > TableScan > alias: b > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Reduce Output Operator > key expressions: id (type: bigint) > sort order: + > Map-reduce partition columns: id (type: bigint) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Reduce Operator Tree: > Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 id (type: bigint) > 1 id (type: bigint) > outputColumnNames: _col0, _col2 > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Filter Operator > predicate: ((_col2 >= '2016-08-20') and (rand() > 0.5)) (type: > boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column > stats: NONE > Select Operator > expressions: _col0 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > Note that the predicate {{a.datestr >= '2016-08-20'}} is not pushed down, > since {{rand()}} is not deterministic and thus the whole predicate is not > eligible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)