[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096985#comment-17096985
 ] 

Haisheng Yuan commented on CALCITE-3963:
----------------------------------------

As long as all the alternatives in a RelSet share the same logical properties, 
we don't care where the logical properties are stored.

I am afraid the 'fold' operator will make things complicated. What about 
cardinality and selectivity? We may just end up with choosing one blindly. It 
doesn't seem right that we use alternative 1's cardinality info, use 
alternative 2's selectivity info, and use all the alternatives' unique keys ...

Admittedly, each alternative's stats may vary a lot, one of the reason is that 
Calcite believes all the simplification should be done in VolcanoPlanner and 
selected based on cost, while other systems like Sql Server and Greenplum do 
all the simplification like constant folding, join simplification, predicate 
push-down before the logical plan goes into the MEMO.

One of the reason to share logical properties between alternatives in a group 
is that it becomes possible (in the future) to do early decision to stop 
exploring this group. If we use the 'fold' operator to decide the group's 
logical properties, when is it good time to decide? 

Option 1: whenever there is a new alternative, recomputing the logical 
properties. That may be not better than just storing logical properties for 
each relnode.

Option 2: roll it up after all the logical alternatives are generated. But 
there is no logical / physical difference, we don't know it is logical operator 
or not. Judging by convention is not perfect, because systems like Flink, 
Drill, Ignite define their own logical convention. There is no logical rule and 
physical rule difference either, they are matched and applied at the same 
stage. Physical rules can even generate logical operators, like 
ProjectMergeRule, will these generated logical operators be counted?

Another reason to share logical properties is to avoid redundant computation. 
For example,
{code:java}
SELECT a,b,c,max(d) FROM foo GROUP BY a,b,c;

HashAggregate
  +-- TableScan
{code}
In distributed system, suppose we generate HashAgg with distribution 
alternatives of all the 8 key combinations. In SQL Server, there is only 1 
physical operator HashAgg, but in Calcite, there are 8 HashAgg operators, the 
same HashAgg with different traitset. We will get another 8 exchange operators 
(in Calcite 1.22 and before, there were more than 50 exchange operators), we 
need to compute the logical properties for all the HashAgg and Exchange 
operators, even the result is cached in metadata system, but these operators 
are just throwing money that are left on the table by LogicalAggregate operator.

> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to