Re: How to optimize repeated RelNode Structures? (CALCITE-3806)

Christian Beikov Thu, 20 Feb 2020 06:54:43 -0800

I guess what you are looking for is something like the SubPlan node of
PostgreSQL? Such a node would materialize the results to be used by
multiple other nodes or in a nested loop context to avoid accessing a
source relation multiple times.


Would be cool if Calcite could create a SubPlan and let the union all node
use that instead. Something like this

Materialize(name=Sub1)

  LogicalProject(EMPNO=[$0])

     LogicalFilter(condition=[OR(>=($0, 7369), >=($0, 7369))])

       LogicalTableScan(table=[[scott, EMP]])

LogicalUnion(all=[true])

   LogicalProject(EMPNO=[$0])

     LogicalFilter(condition=[>=($0, 7369)])

       LogicalTableScan(table=[[Sub1]])

   LogicalProject(EMPNO=[$0])

     LogicalFilter(condition=[>=($0, 7369)])

       LogicalTableScan(table=[[Sub1]])

Of course there are further optimizations that could be done like removing
filters etc. but that's just a general example.

Not sure if the Spool operator does that, but such a translation could also
come in handy for distributed queries where every row access to a remote
system has protocol overhead.

Danny Chan <[email protected]> schrieb am Do., 20. Feb. 2020, 15:27:

> Calcite has a Spool operator, maybe you can check that.
>
> Anjali Shrishrimal <[email protected]>于2020年2月20日
> 周四下午3:20写道：
>
> > Hi everybody,
> >
> > I would like to have your suggestions on CALCITE-3806.
> >
> > Asking it here as suggested by Julian.
> >
> >
> >
> >
> >
> > If RelNode tree contains a subtree whose result can be obtained by some
> > other part of the same tree,
> >
> > can we optimize it ? and how to express it in plan ?
> >
> >
> >
> > For example,
> >
> > Let's say input structure looks like this :
> >
> >
> >
> > LogicalUnion(all=[true])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> >     LogicalFilter(condition=[>=($0, 7369)])
> >
> >       LogicalTableScan(table=[[scott, EMP]])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> >     LogicalFilter(condition=[>=($0, 7369)])
> >
> >       LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> >
> >
> > In this case,
> >
> >
> >
> >   LogicalProject(EMPNO=[$0])
> >
> >     LogicalFilter(condition=[>=($0, 7369)])
> >
> >       LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> > is repeated. It is going to fetch same data twice.
> >
> > Can we save one fetch? Can we somehow tell 2nd input of union to make use
> > of union's 1st input. Is there any way to express that in plan?
> >
> >
> >
> > Also,
> > If the structure was like this :
> >
> >
> >
> > LogicalUnion(all=[true])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> >     LogicalFilter(condition=[>=($0, 7369)])
> >
> >       LogicalTableScan(table=[[scott, EMP]])
> >
> >   LogicalProject(EMPNO=[$0])
> >
> >     LogicalFilter(condition=[>=($0, 8000)])
> >
> >       LogicalTableScan(table=[[scott, EMP]])
> >
> >
> >
> > Second part of union can perform filtering on fetched data of 1st part.
> > (As second's output is subset of first's output)
> >
> >
> >
> > Does calcite provide such kind of optimizations ?
> >
> > If not, what are the challenges to do so?
> >
> >
> >
> >
> >
> >
> >
> > Would love to hear your thoughts.
> >
> >
> >
> >
> > Thank you,
> > Anjali Shrishrimal
> >
>

Re: How to optimize repeated RelNode Structures? (CALCITE-3806)

Reply via email to