[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126449#comment-17126449
 ] 

Julian Hyde commented on CALCITE-3963:
--------------------------------------

OK, it's deterministic, but it's not robust. With narrow consistency, your test 
will give the same results run after run, but the results may change if someone 
changes the code - say adds an extra planner rule, or adds an index.

Usually in our field it's a cause for celebration when we discover nice 
algebraic properties such as 
[semigroups|https://en.wikipedia.org/wiki/Semigroup] or monoids. Those 
properties can be exploited to make computation more scalable and robust. 
(Consider how we can roll up measures in aggregate tables.)

I concede that row count estimates are not semigroups. But most of our 
statistics are semigroups - I think we should exploit that fact.

Maybe that row counts can be treated specially - say, use the first rel in a 
set, and take the min when you have significant new information (e.g. you've 
hit a materialized view) or when you merge sets. Row counts are based on other 
statistics, and if those are maintained using robust combiners then row counts 
will become a bit more robust.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to