[ 
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096889#comment-17096889
 ] 

Julian Hyde commented on CALCITE-3963:
--------------------------------------

Minor quibble: in JIRA subject, use the imperative form of the verb 
("Maintain") rather than third-person active ("Maintains")

When you stay "maintain" do you mean "store"? I'm not sure I agree. The 
metadata system allows us to derive a property for any {{RelNode}} (e.g. 
calling {{RelMetadataQuery. getUniqueKeys(RelNode rel, boolean ignoreNulls)}} 
on a particular {{LogicalProject}}) and it also maintains a cache, so that once 
derived, the value does not have to be re-computed.

So, the metadata system allows us to not worry too much about whether values 
are stored, which is good.

Now, let's suppose that you want to know the unique keys of a particular 
{{RelSet}} (or {{RelSubSet}} - the reasoning is similar). Unique keys are a 
logical property, so we should be able to derive the set of unique keys by 
taking the union of the unique keys of every {{RelNode}} in that set.

If you add a {{RelNode}} to a set, or merge sets, then the set may acquire 
additional unique keys. And those keys may cause changes to unique keys (and 
other metadata) for any {{RelNode}} that consumes any {{RelNode}} in the set. 
It's complicated, so we should lean on the metadata system to maintain 
everything for us.

I think we need to add a 'fold' operator to each type of metadata to say how 
the metadata of the {{RelSet}} is derived from those of the constituent nodes. 
In the case of {{RelMdUniqueKeys}} the fold operator is 'union'. (In SQL terms, 
the 'fold' operator would be called a 'roll up', that is, an aggregate 
function. {{RelMdMinRowCount}} rolls up using {{MAX}}. Et cetera.)

As I said earlier, we should not focus on where the {{RelSet}}'s metadata is 
stored. Let the metadata system worry about that. Focus instead on how the 
metadata is derived.



> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-3963
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3963
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Xiening Dai
>            Assignee: Xiening Dai
>            Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc) 
> are maintained at RelNode level. This creates a number of meta data 
> consistency problems, e.g. CALCITE-1048, CALCITE-2166. 
> In theory, all RelNodes in a RelSet should share the same logical properties 
> per definition of relational equivalence. So it makes more sense to keep 
> logical properties at RelSet level, rather than the RelNode. And such 
> properties shouldn't change when new sub set is created or subset's best is 
> changed.
> Specifically I think below build in metadata should fall into the logical 
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to