[jira] [Commented] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

Laljo John Pullokkaran (JIRA) Tue, 08 Sep 2015 15:56:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735779#comment-14735779
 ]


Laljo John Pullokkaran commented on HIVE-11110:
-----------------------------------------------

When a predicate involves deterministic & non deterministic udfs, the 
deterministic pieces needs to be pulled out.

Example: select a.* from srcpart a where rand(1) < 0.1 and a.ds = '2008-04-08' 
and not(key > 50 or key < 10) and a.hr like '%2';
This should be rewritten to push a.ds = '2008-04-08' and not(key > 50 or key < 
10) and a.hr like '%2';

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11110
>                 URL: https://issues.apache.org/jira/browse/HIVE-11110
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Laljo John Pullokkaran
>         Attachments: HIVE-11110-branch-1.2.patch, HIVE-11110.1.patch, 
> HIVE-11110.2.patch, HIVE-11110.4.patch, HIVE-11110.5.patch, 
> HIVE-11110.6.patch, HIVE-11110.7.patch, HIVE-11110.8.patch, 
> HIVE-11110.9.patch, HIVE-11110.91.patch, HIVE-11110.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>      ,store_returns
>      ,date_dim d1
>      ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>    and d1.d_date_sk = ss_sold_date_sk
>    and ss_customer_sk = sr_customer_sk
>    and ss_item_sk = sr_item_sk
>    and ss_ticket_number = sr_ticket_number
>    and sr_returned_date_sk = d2.d_date_sk
>    and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used 
> in a join clause. The join clause should add a filter “filterExpr: 
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
> fetching the stats. Currently this is not done in CBO planning, which results 
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
> the optimization phase. In particular, this increases the NDV for the join 
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

Reply via email to