[
https://issues.apache.org/jira/browse/CALCITE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198609#comment-17198609
]
Julian Hyde commented on CALCITE-3709:
--------------------------------------
I agree that for Filter (and Join) the cost metric should be "rows processed"
not "rows returned". It is a better measure of the amount of work done by the
node.
The "rows" statistic for Filter, Join (and all RelNodes) should continue to be
an estimate of the number of rows returned.
In explain plan you can show "rows rejected" if you like. But "rows rejected"
would not be a fundamental new metric.
> Use "rejected row count" for RelOptCost#getRows
> -----------------------------------------------
>
> Key: CALCITE-3709
> URL: https://issues.apache.org/jira/browse/CALCITE-3709
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.21.0
> Reporter: Vladimir Sitnikov
> Priority: Major
>
> Current cost#rows has a problem: it does not add well when computing
> cumulative cost.
> So the idea is to use the number of _rejected_ rows.
> Then the field would have certain meaning:
> * If the value is high, the plan is probably rejecting a lot of unrelated
> rows, thus it is suboptimal
> * Extra Project/Calc nodes won't artificially increase rows in the cost
> fields. Currently each Project adds "rows" which is not very good.
> * It is clear what to put to the rows field: "rejected rows" is more-or-less
> understandable. For Project it would be 0.
> * Join/Filter/Calc nodes would show "estimated number of returned rows=X
> (from metadataquery), rejected rows=Y (from cost)" which would help
> understanding where the time is spent
> That is inspired by PostgreSQL's "rows removed by filter" when running
> explain analyze (which is statement execution + collecting statistics on each
> execution plan node):
> http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.2#Explain_improvements
--
This message was sent by Atlassian Jira
(v8.3.4#803005)