[
https://issues.apache.org/jira/browse/CALCITE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697778#comment-17697778
]
Ruben Q L commented on CALCITE-5559:
------------------------------------
Correct [~julianhyde], at this moment RepeatUnion has a control to avoid
returning duplicates in the "final" result if its flag all=false. The problem
is that if there are duplicates generated within iteration X, these duplicates
(even if the are correctly handled by the RepeatUnion) will be fed back as
input for the iteration X+1, leading to unnecessary extra processing.
To recap the full picture, SQL recursive union functionality in Calcite is
split in 3 main operators: RepeatUnion, TableSpool and TransientScan:
{noformat}
RepeatUnion(all: true/false)
TableSpool(table: DELTA)
... <seed subplan> ...
TableSpool(table: DELTA)
... <iterative subplan> ...
TransientScan(table: DELTA)
{noformat}
The RepeatUnion orchestrates the process: calls its left input (seed) once, and
then its right input (iterative) over and over until no more results are
produced (or until an optional iterationLimit is reached). The "magic" to make
the output of iteration X to become the input of iteration X+1 comes from the
TableSpools+TransientScan who "share" the same table (TableSpool feeds it, then
TransientScan consumes it in the next iteration). So, even though global
duplicates are discarded by the RepeatUnion at the top level, the duplicates
within a certain iteration are not discarded when the TableSpool feeds the
TransientScan, which may lead to unnecessary work in the following iteration.
In order to optimize that, we have mentioned two approaches:
A) Improve TableSpool (or rather provide a new TableSpool) so that it can deals
with duplicates in case of RepeatUnion all=false:
{noformat}
RepeatUnion(all: false)
DistinctTableSpool(table: DELTA)
... <seed subplan> ...
DistinctTableSpool(table: DELTA)
... <iterative subplan> ...
TransientScan(table: DELTA)
{noformat}
I have a PoC with this approach, I could create a PR right away.
B) Inject an Aggregate (distinct) before the TableSpool, with the limitation
that currently this would not work due to the "enumerator eager evaluation" of
aggregates, as explained on my last comment (so this would require some
preliminary work):
{noformat}
RepeatUnion(all: false)
TableSpool(table: DELTA)
Aggregate(distinct)
... <seed subplan> ...
TableSpool(table: DELTA)
Aggregate(distinct)
... <iterative subplan> ...
TransientScan(table: DELTA)
{noformat}
Finally, let's remember that RepeatUnion, TableSpool and TransientScan are
flagged as "experimental and subject to change without notice", so maybe
approach B's preliminary refactoring around Aggregate could be an overkill to
accommodate an optimization on an experimental feature; and A would seem a
simpler, more localized solution that would not impact other (non-experimental)
operators.
> Improve RepeatUnion by discarding duplicates at TableSpool level
> ----------------------------------------------------------------
>
> Key: CALCITE-5559
> URL: https://issues.apache.org/jira/browse/CALCITE-5559
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Ruben Q L
> Assignee: Ruben Q L
> Priority: Major
>
> Currently, RepeatUnion operator with all=false keeps track of the elements
> that it has returned in order to discard duplicates. However, the TableSpool
> operators that are right below it do not have such control. In certain
> scenarios, duplicates are returned by the TableSpool current iteration,
> discarded by the RepeatUnion, but have been already "fed back" by the
> TableSpool into the next iteration, causing unnecessary processing.
> We can optimize this scenario by keeping track of the duplicates
> inside/before the TableSpool too (note: we still need to keep track of
> duplicates at RepeatUnion level, because that is the only place where we can
> detect a potential "global duplicate" of an element: returned by the LHS and
> then also by the RHS, or by two different iterations of the RHS).
> A PoC testing this improvement on a downstream project showed that certain
> queries can go from ~40s down to ~1s.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)