Siddharth Teotia created CALCITE-2828:
-----------------------------------------
Summary: Handle cost propagation properly in Volcano Planner
Key: CALCITE-2828
URL: https://issues.apache.org/jira/browse/CALCITE-2828
Project: Calcite
Issue Type: Bug
Reporter: Siddharth Teotia
Assignee: Julian Hyde
When getCost(rel) is called, a node's nonCumulativeCost() is computed. When
using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost,
etc.) for future use. In order to make sure that we do not use stale metadata,
each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to
invalidate the cache (if the cached entry has timestamp !=
getRelMetadataTimestamp(rel), it is not used.
The problem in this case was due to the fact that VolcanoPlanner uses the rel's
current RelSubset's timestamp as getRelMetadataTimestamp(). Since a rel can
belong to multiple RelSubset, this results in inconsistent cache hits/misses.
For example, if a rel belongs to RelSubset#1 and RelSubset#2 with
relMetadataTimestamp of 1 and 2, respectively. If rel happens to update its
cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so
when the same rel in RelSubset#2's context try to look up its metadata, it will
fail. This results in inefficient use of the cache. The main problem occurs
when we get incorrect cache hits (e.g. previous iteration of metadata query on
RelSubset#2 populated the cache with timestamp 2, but later in the context of
RelSubset#1, we think there is a valid cache and use the stale metadata)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)