[
https://issues.apache.org/jira/browse/CALCITE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-2828:
------------------------------------
Labels: pull-request-available (was: )
> Handle cost propagation properly in Volcano Planner
> ---------------------------------------------------
>
> Key: CALCITE-2828
> URL: https://issues.apache.org/jira/browse/CALCITE-2828
> Project: Calcite
> Issue Type: Bug
> Reporter: Siddharth Teotia
> Assignee: Julian Hyde
> Priority: Major
> Labels: pull-request-available
>
> When getCost(rel) is called, a node's nonCumulativeCost() is computed. When
> using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost,
> etc.) for future use. In order to make sure that we do not use stale
> metadata, each RelOptPlanner provides getRelMetadataTimestamp(rel) which is
> used to invalidate the cache (if the cached entry has timestamp !=
> getRelMetadataTimestamp(rel), it is not used.
>
> The problem in this case was due to the fact that VolcanoPlanner uses the
> rel's current RelSubset's timestamp as getRelMetadataTimestamp(). Since a rel
> can belong to multiple RelSubset, this results in inconsistent cache
> hits/misses. For example, if a rel belongs to RelSubset#1 and RelSubset#2
> with relMetadataTimestamp of 1 and 2, respectively. If rel happens to update
> its cost with RelSubset#1 first, then the cache will be updated with
> timestamp 1 so when the same rel in RelSubset#2's context try to look up its
> metadata, it will fail. This results in inefficient use of the cache. The
> main problem occurs when we get incorrect cache hits (e.g. previous iteration
> of metadata query on RelSubset#2 populated the cache with timestamp 2, but
> later in the context of RelSubset#1, we think there is a valid cache and use
> the stale metadata)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)