[
https://issues.apache.org/jira/browse/CALCITE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188707#comment-17188707
]
Rui Wang edited comment on CALCITE-4208 at 9/1/20, 5:54 PM:
------------------------------------------------------------
I am not familiar with the context of existing row count estimation model, just
based on the formula here, I think:
{code:java}
innerJoinRowCount = leftRowCount * rightRowCount * mq.getSelectivity(join,
condition)
leftRowCount = leftRowCount + innerJoinRowCount = leftRowCount * (1 +
rightRowCount * mq.getSelectivity(join, condition))
{code}
Similarly for right join.
So if rightRowCount * mq.getSelectivity(join, condition) is much larger, that 1
can be ignored. If 1 is the dominate part, the row count estimation won't be a
big number anyway.
I am thinking that is why at least INNER/LEFT/RIGHT have the same model. Full
join could have a similar argument.
was (Author: amaliujia):
I am not familiar with the context of existing row count estimation model, just
based on the formula here, I think:
innerJoinRowCount = leftRowCount * rightRowCount * mq.getSelectivity(join,
condition)
leftRowCount = leftRowCount + innerJoinRowCount = leftRowCount * (1 +
rightRowCount * mq.getSelectivity(join, condition))
similarly for right join
So if rightRowCount * mq.getSelectivity(join, condition) is much larger, that 1
can be ignored. If 1 is the dominate part, the row count estimation won't be a
big number anyway.
I am thinking that is why at least INNER/LEFT/RIGHT have the same model. Full
join could have a similar argument.
> Improve metadata row count for Join
> -----------------------------------
>
> Key: CALCITE-4208
> URL: https://issues.apache.org/jira/browse/CALCITE-4208
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Ruben Q L
> Priority: Major
>
> Currently, the default metadata row count for join
> {{RelMdRowCount#getRowCount(Join rel, RelMetadataQuery mq)}} relies on
> {{RelMdUtil.getJoinRowCount}}. This method has several issues:
> - In case of ANTI join, it returns the same estimation as a SEMI join
> - In other cases (INNER, LEFT, RIGHT, FULL), it returns always the same
> formula:
> {{leftRowCount * rightRowCount * mq.getSelectivity(join, condition)}}
> which seems valid for an INNER join, but not for LEFT / RIGHT / FULL.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)