[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140741#comment-17140741
 ] 

Xiening Dai commented on CALCITE-3963:
--------------------------------------

{quote}
For sets of predicates and unique keys, the operation is union. For 
minRowCount, the operation is max.
{quote}

Do you have a concrete example where different RelNodes in a set have different 
unique keys and a union of those would make sense? Regarding the minRowCount, 
how do we know the max value is the best or most accurate? In your 
hyperthetical example, the Project vs Join case, if Project reports a big 
minRowCount, you just pick the one from Project? How does this would solve the 
problem?

I tend to agree with you that we might need to consider the input confidence 
when report current estimate confidence. But as you said, the way of doing it 
would greatly complicate the solution, and doesn't seem quite necessary at this 
point. In practice, the example you gave is not a problem. A MultiJoin has low 
confidence and its RelSet stats will be replaced when it's converted into 
LogicalJoin which gives better estimate. And this change would propagate to its 
parent Project node so Project stats should be the same with the Join 
eventually.

> Maintain logical properties at RelSet (equivalent group) instead of RelNode
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to