[ 
https://issues.apache.org/jira/browse/HIVE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20332:
-------------------------------------------
    Status: Patch Available  (was: In Progress)

> Materialized views: Introduce heuristic on selectivity over ROW__ID to favour 
> incremental rebuild
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20332
>                 URL: https://issues.apache.org/jira/browse/HIVE-20332
>             Project: Hive
>          Issue Type: Improvement
>          Components: Materialized views
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-20332.patch
>
>
> Currently, we do not expose stats over {{ROW\_\_ID.writeId}} to the optimizer 
> (this should be fixed by HIVE-20313). Even if we did, we always assume 
> uniform distribution of the column values, which can easily lead to 
> overestimations on the number of rows read when we filter on 
> {{ROW\_\_ID.writeId}} for materialized views (think about a large transaction 
> for MV creation and then small ones for incremental maintenance). This 
> overestimation can lead to incremental view maintenance not being triggered 
> as cost of the incremental plan is overestimated (we think we will read more 
> rows than we actually do). This could be fixed by introducing histograms that 
> reflect better the column values distribution.
> Till both fixes are implemented, we will use a config variable that will set 
> the selectivity for filter condition on {{ROW\_\_ID}} during the cost 
> calculation. Setting that variable to a low value will favour incremental 
> rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to