StevenMPhillips opened a new pull request #1054: [CALCITE-2828] Fix 
VolcanoPlanner.validate() to handle cost propagati…
URL: https://github.com/apache/calcite/pull/1054
 
 
   …on properly
   
   When getCost(rel) is called, a node's nonCumulativeCost() is computed.  When 
using CachingRelMetadataProvider is used, metadata is cached (rowCount, cost, 
etc.) for future use.  In order to make sure that we do not use stale metadata, 
each RelOptPlanner provides getRelMetadataTimestamp(rel) which is used to 
invalidate the cache (if the cached entry has timestamp != 
getRelMetadataTimestamp(rel), it is not used.
   
   The problem in this case was due to the fact that VolcanoPlanner uses the 
rel's current RelSubset's timestamp as getRelMetadataTimestamp().  Since a rel 
can belong to multiple RelSubset, this results in inconsistent cache 
hits/misses.  For example, if a rel belongs to RelSubset#1 and RelSubset#2 with 
relMetadataTimestamp of 1 and 2, respectively.  If rel happens to update its 
cost with RelSubset#1 first, then the cache will be updated with timestamp 1 so 
when the same rel in RelSubset#2's context try to look up its metadata, it will 
fail.  This results in inefficient use of the cache.  The main problem occurs 
when we get incorrect cache hits (e.g. previous iteration of metadata query on 
RelSubset#2 populated the cache with timestamp 2, but later in the context of 
RelSubset#1, we think there is a valid cache and use the stale metadata).
   
   Change-Id: Iefb630f5813ba497b7fbc0144c8fd6050e59b1a3

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to