[
https://issues.apache.org/jira/browse/HIVE-20382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez updated HIVE-20382:
-------------------------------------------
Attachment: HIVE-20382.patch
> Materialized views: Introduce heuristic to favour incremental rebuild
> ---------------------------------------------------------------------
>
> Key: HIVE-20382
> URL: https://issues.apache.org/jira/browse/HIVE-20382
> Project: Hive
> Issue Type: Improvement
> Components: Materialized views
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Priority: Major
> Attachments: HIVE-20382.patch
>
>
> Currently, we do not expose stats over ROW__ID.writeId to the optimizer (this
> should be fixed by HIVE-20313). Even if we did, we always assume uniform
> distribution of the column values, which can easily lead to overestimations
> on the number of rows read when we filter on ROW__ID.writeId for materialized
> views (think about a large transaction for MV creation and then small ones
> for incremental maintenance). This overestimation can lead to incremental
> view maintenance not being triggered as cost of the incremental plan is
> overestimated (we think we will read more rows than we actually do). This
> could be fixed by introducing histograms that reflect better the column
> values distribution.
> Till both fixes are implemented, we will use a config variable that will
> multiply the estimated cost of the rebuild plan and hence will be able to
> favour incremental rebuild over full rebuild.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)