[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137905#comment-17137905
 ] 

Julian Hyde commented on CALCITE-3963:
--------------------------------------

bq. How would you define satisfactory definition? The confidence level is 
provided by RelNode, and they can be customized to reflect the accuracy level 
of estimations.

Consider a RelSet that contains two RelNodes: a Project that is based on a 
ten-way join, and a Join of two table scans. Which of the row-count estimates 
has higher confidence? The Project is deemed high-confidence, but is actually 
garbage because no one knows how many rows will come out of a ten-way join.

The calculations of confidence level CAN be customized but I predict that they 
will not be.

To do them properly we would have to take into account not just the definition 
of the RelNode but the confidence level of all of the statistics it is based 
upon. We would be making all of our statistics stochastic. It would be a big 
project and I would be skeptical that we could accomplish it unless we had a 
testing strategy and a process to monitor and tune the formulas that we use.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to