[
https://issues.apache.org/jira/browse/CALCITE-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097015#comment-17097015
]
Xiening Dai commented on CALCITE-3963:
--------------------------------------
{quote}We shouldn't rely on the first rel or subset's best.{quote}
To further explain this, consider this example. We have a simple join case
which has two alternatives.
Plan A:
{code:java}
HashJoin
TableScanA
TableScanB
{code}
Plan B:
{code:java}
MergeJoin
Sort
TableScanA
Sort
TableScanB
{code}
Assuming the self cost of hash join and merge join are similar, then plan A is
better since it doesn't incur sorting. But because these two join nodes have
different input subset, the input row counts are decided by each subset's best
node. If for some reason, we report a smaller row count in plan B's Sort subset
(in this simple example it shouldn't, but it's possible in real world when
input is much more complex), we could end up picking plan B as its overall cost
is lower.
We've seen issues like this before.
> Maintains logical properties at RelSet (equivalent group) instead of RelNode
> ----------------------------------------------------------------------------
>
> Key: CALCITE-3963
> URL: https://issues.apache.org/jira/browse/CALCITE-3963
> Project: Calcite
> Issue Type: Bug
> Reporter: Xiening Dai
> Assignee: Xiening Dai
> Priority: Major
>
> Currently the logical properties (such as row count, distinct row count, etc)
> are maintained at RelNode level. This creates a number of meta data
> consistency problems, e.g. CALCITE-1048, CALCITE-2166.
> In theory, all RelNodes in a RelSet should share the same logical properties
> per definition of relational equivalence. So it makes more sense to keep
> logical properties at RelSet level, rather than the RelNode. And such
> properties shouldn't change when new sub set is created or subset's best is
> changed.
> Specifically I think below build in metadata should fall into the logical
> properties category -
> Selectivity
> UniqueKeys
> ColumnUniqueness
> RowCount
> MaxRowCount
> MinRowCount
> DistinctRowCount
> Size (averageRowSize, averageColumnSize)
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)