[
https://issues.apache.org/jira/browse/HIVE-11110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005320#comment-15005320
]
Jesus Camacho Rodriguez commented on HIVE-11110:
------------------------------------------------
[~jpullokkaran], patch changes look good to me.
I have a couple of small notes:
- In SqlFunctionConverter, why don't we use logic similar to "in", "between",
and "row", but for "is not null"? I.e. adding {{case IS_NOT_NULL}} in line 205.
Since we recognize the function in the way in (line 323), we should be able to
do that and remove lines 212-217.
- Style pick: object name starting with capital letter: ({{OperandsToPushDown}}
in {{HivePreFilteringRule}}).
> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules,
> improve Filter selectivity estimation
> ------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-11110
> URL: https://issues.apache.org/jira/browse/HIVE-11110
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Reporter: Jesus Camacho Rodriguez
> Assignee: Laljo John Pullokkaran
> Attachments: HIVE-11110-10.patch, HIVE-11110-11.patch,
> HIVE-11110-12.patch, HIVE-11110-branch-1.2.patch, HIVE-11110.1.patch,
> HIVE-11110.13.patch, HIVE-11110.14.patch, HIVE-11110.15.patch,
> HIVE-11110.16.patch, HIVE-11110.17.patch, HIVE-11110.18.patch,
> HIVE-11110.19.patch, HIVE-11110.2.patch, HIVE-11110.20.patch,
> HIVE-11110.21.patch, HIVE-11110.22.patch, HIVE-11110.23.patch,
> HIVE-11110.4.patch, HIVE-11110.5.patch, HIVE-11110.6.patch,
> HIVE-11110.7.patch, HIVE-11110.8.patch, HIVE-11110.9.patch,
> HIVE-11110.91.patch, HIVE-11110.92.patch, HIVE-11110.patch
>
>
> Query
> {code}
> select count(*)
> from store_sales
> ,store_returns
> ,date_dim d1
> ,date_dim d2
> where d1.d_quarter_name = '2000Q1'
> and d1.d_date_sk = ss_sold_date_sk
> and ss_customer_sk = sr_customer_sk
> and ss_item_sk = sr_item_sk
> and ss_ticket_number = sr_ticket_number
> and sr_returned_date_sk = d2.d_date_sk
> and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used
> in a join clause. The join clause should add a filter “filterExpr:
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when
> fetching the stats. Currently this is not done in CBO planning, which results
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in
> the optimization phase. In particular, this increases the NDV for the join
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)