[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097006#comment-17097006
 ] 

Xiening Dai commented on CALCITE-3963:
--------------------------------------

What I mean by "maintain" is more about associating these properties with 
RelSet rather than RelNode. They can still store in meta data cache somehow, 
which would be an implementation detail. But conceptually they should belong to 
RelSet.

For example, when calculate row count for a RelSubset, the logic today is to 
use row count of subset.best, and if best is not available, we use the row 
count of the first rel in the set. The logic is flawed in my opinion. 
Essentially the row count should be consistent across the entire set, and only 
changes when a new logical node is added to the set, or the set gets merged. We 
shouldn't rely on the first rel or subset's best.

One of the clear benefits, which Haisheng already mentioned, is to save a large 
amount of cache memory and avoid unnecessary re-calculation. But more 
importantly we plug this hole in the conceptual design.

In terms of how we derive logical properties for the set, I think in a lot of 
cases, we don't "aggregate" inputs from the nodes, but more likely we choose 
the most convincing, or promising, node to report this stat. In the "unique 
keys" example you mentioned, do you have a real world case where RelNodes 
within one set have different unique keys?

 

 

 

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to