Hello,
Stamatis and I have been discussing about this topic, and currently we have
two main approaches on how to represent the logical plan.
Consider the following basic example: a SQL recursive union query that
generates numbers 1,2,..10:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 10
)
SELECT * FROM t
*Version 1*
LogicalRepeatUnionV1(table="t", maxRep=-1, all=true)
LogicalValues({1})
LogicalProject($0=$0+1)
LogicalFilter($0<10)
LogicalTableScan(table="t")
The operator LogicalRepeatUnionV1 will be a pure translation of the
SQL recursive union mechanism (it could be named LogicalRecursiveUnion
indeed).
Its behavior (as described e.g. here
<https://www.postgresql.org/docs/9.1/queries-with.html>) will be:
- Evaluate the left input once, place the results in a temporary
working table ("t" in our example)
- Evaluate the right input, saving its results in an intermediate
location. When no more results are returned, the content of "t" will
be replaced with the results from the intermediate location, and the
right input will be re-evaluated again. This process will repeat over
and over, until it produces no more results (or until an optional
value maxRep is reached, the default value being -1, i.e. no limit).
*Version 2*
LogicalRepeatUnionV2 (maxRep=-1, all=true)
LogicalTableSpool(table="t")
LogicalValues({1})
LogicalTableSpool(table="t")
LogicalProject($0=$0+1)
LogicalFilter($0<10)
LogicalTableScan(table="t")
The operator LogicalRepeatUnionV2 will perform the following tasks:
- Evaluate the left input once.
- Evaluate the right input over and over until it produces no more
results (or until an optional value maxRep is reached, the default
value being -1, i.e. no limit).
As we can see, LogicalRepeatUnionV2 will be in charge on the iteration
only, it will not be responsible for storing the results for the next
iteration that is needed for the SQL recursive union, this logic will
be delegated to a LogicalTableSpool.
Both versions will satisfy our requirements in order to implement the
SQL recursive union, the main difference being:
- Version1 will produce a simpler and more concise logical plan, but
the LogicalRepeatUnionV1 operator will only fit this purpose and could
not be reused in the future in other contexts.
- Version2 requires a slightly more complicated logical plan, with the
addition of the table spools. However, its LogicalRepeatUnionV2
operator is more "purpose-agnostic", and can be reused in the future
for other plans that would require repetition. For instance, it could
be used to implement a for loop: let us say that we need to build a
plan where we must run a certain UserDefinedFunction ten times
(because each time its results will vary depending on whatever
factor). A possible way to implement that will be using
LogicalRepeatUnionV2, with maxRep=10, an empty values as left input
(no results actually needed from that side) and the
UserDefinedFunction as right input:
LogicalRepeatUnionV2(maxRep=10, all=true)
Values({})
UserDefinedFunction(...)
We prefer Version2 that seems to be more useful in the long term, so
we are ready to go on with this approach if there are no objections
(keeping in mind that this is still an "experimental" feature).
How does it look?
Best regards,
Ruben
Le mar. 9 avr. 2019 à 01:07, Walaa Eldin Moustafa <[email protected]> a
écrit :
> Stamatis, here is some recent work we did on that topic:
> http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
>
> References section has more pointers to other related work.
>
> Thanks,
> Walaa.
>
> On Sun, Apr 7, 2019 at 11:15 PM Stamatis Zampetakis <[email protected]>
> wrote:
> >
> > Hello,
> >
> > @Julian: It looks like an interesting project, looking forward to see
> what
> > comes out of it.
> >
> > @Walaa: The current implementation takes care of UNION ALL but it seems
> > relatively easy to extend it for UNION. At least with a naive
> > implementation. Research-wise, I am not very familiar with the domain so
> if
> > you could share a few papers it would be nice :)
> >
> > @Ruben: Thanks for the changes and great work so far. I will have a look
> > when I find some time.
> >
> > Best,
> > Stamatis
> >
> > On Fri, Apr 5, 2019 at 3:48 PM Ruben Q L <[email protected]> wrote:
> >
> > > Thanks for the feedback.
> > > I have pushed <https://github.com/apache/calcite/pull/1020> the latest
> > > modifications. As requested, I have flagged all new features as
> > > "experimental".
> > >
> > > Best regards,
> > > Ruben
> > >
> > >
> > > Le jeu. 4 avr. 2019 à 20:29, Walaa Eldin Moustafa <
> [email protected]>
> > > a
> > > écrit :
> > >
> > > > +1 to Julian.
> > > >
> > > > There are open questions on supporting UNION (instead of UNION ALL),
> > > > aggregation and negation through recursion. Many of those topics
> still
> > > > have papers written to address their open questions until today. I
> > > > think the PR is geared towards addressing the UNION ALL case, and
> > > > extending the architecture to cover the more involved cases may be
> > > > tricky if we commit to the UNION ALL friendly approach.
> > > >
> > > > Thanks,
> > > > Walaa.
> > > >
> > > > On Thu, Apr 4, 2019 at 11:23 AM Julian Hyde <[email protected]>
> wrote:
> > > > >
> > > > > I guess my concerns would all be met if we could flag this as
> > > > “experimental”.
> > > > >
> > > > > There are more general forms of recursive / deductive queries that
> I
> > > > would like us to be able to tackle someday. If and when we do, I
> don’t
> > > want
> > > > to be locked into the structures and nomenclature of this change. But
> > > until
> > > > then, this change is a huge step forward. If the APIs were labeled
> > > > “experimental and subject to change without notice” then we can
> evolve as
> > > > we know more.
> > > > >
> > > > > By the way, I have personal project where I am experimenting with
> > > > melding functional programming with relational algebra. If you have
> > > > functions-as-values in relational algebra then you can add a
> fixed-point
> > > > operator and thereby create recursive/deductive queries. You can use
> > > > higher-order functions such as map and filter on nested collections,
> and
> > > > exploit the fact that the relational operators (project, filter,
> join)
> > > are
> > > > themselves higher-order functions. Rather than adding functional
> > > extensions
> > > > to a relational language, I started digging the tunnel from the other
> > > end:
> > > > I started with a small, elegant functional language (ML), wrote an
> > > > interpreter in Java, and am extending the language for relational
> > > algebra.
> > > > If all goes well, there would be some extensions in Calcite’s
> algebra for
> > > > functions-as-values and relational expressions that are defined by
> > > > (possibly recursive) functions.
> > > > >
> > > > > Julian
> > > > >
> > > > > [1] https://github.com/julianhyde/smlj <
> > > > https://github.com/julianhyde/smlj>
> > > > >
> > > > >
> > > > > > On Apr 4, 2019, at 9:31 AM, Stamatis Zampetakis <
> [email protected]>
> > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > The issue has advanced quite a lot and I guess it will be
> finalised
> > > > soon.
> > > > > > There is some ongoing discussion about a few naming/refactorings
> in
> > > > the PR
> > > > > > [1] but these could be resolved easily.
> > > > > >
> > > > > > Ruben, had made a proposal which LGTM so I am inclined to merge
> the
> > > PR
> > > > as
> > > > > > soon as the last changes are made (and no other serious issues
> > > appear).
> > > > > >
> > > > > > I guess it is the right time to jump in to the discusion,
> especially
> > > > if you
> > > > > > strongly disagree on the general approach.
> > > > > >
> > > > > > [1] https://github.com/apache/calcite/pull/1020
> > > > > >
> > > > > > Best,
> > > > > > Stamatis
> > > > >
> > > >
> > >
>