Are you considering refactoring the UNION ALL approach?

On Thu, Apr 18, 2019 at 6:04 AM Ruben Q L <[email protected]> wrote:

> Hello,
>
> Stamatis and I have been discussing about this topic, and currently we have
> two main approaches on how to represent the logical plan.
>
> Consider the following basic example: a SQL recursive union query that
> generates numbers 1,2,..10:
>
>    WITH RECURSIVE t(n) AS (
>      VALUES (1)
>      UNION ALL
>      SELECT n+1 FROM t WHERE n < 10
>    )
>    SELECT * FROM t
>
> *Version 1*
>
> LogicalRepeatUnionV1(table="t", maxRep=-1, all=true)
>   LogicalValues({1})
>   LogicalProject($0=$0+1)
>     LogicalFilter($0<10)
>       LogicalTableScan(table="t")
>
> The operator LogicalRepeatUnionV1 will be a pure translation of the
> SQL recursive union mechanism (it could be named LogicalRecursiveUnion
> indeed).
> Its behavior (as described e.g. here
> <https://www.postgresql.org/docs/9.1/queries-with.html>) will be:
>   - Evaluate the left input once, place the results in a temporary
> working table ("t" in our example)
>   - Evaluate the right input, saving its results in an intermediate
> location. When no more results are returned, the content of "t" will
> be replaced with the results from the intermediate location, and the
> right input will be re-evaluated again. This process will repeat over
> and over, until it produces no more results (or until an optional
> value maxRep is reached, the default value being -1, i.e. no limit).
>
> *Version 2*
>
> LogicalRepeatUnionV2 (maxRep=-1, all=true)
>   LogicalTableSpool(table="t")
>     LogicalValues({1})
>   LogicalTableSpool(table="t")
>     LogicalProject($0=$0+1)
>       LogicalFilter($0<10)
>         LogicalTableScan(table="t")
>
> The operator LogicalRepeatUnionV2 will perform the following tasks:
>   - Evaluate the left input once.
>   - Evaluate the right input over and over until it produces no more
> results (or until an optional value maxRep is reached, the default
> value being -1, i.e. no limit).
> As we can see, LogicalRepeatUnionV2 will be in charge on the iteration
> only, it will not be responsible for storing the results for the next
> iteration that is needed for the SQL recursive union, this logic will
> be delegated to a LogicalTableSpool.
>
> Both versions will satisfy our requirements in order to implement the
> SQL recursive union, the main difference being:
> - Version1 will produce a simpler and more concise logical plan, but
> the LogicalRepeatUnionV1 operator will only fit this purpose and could
> not be reused in the future in other contexts.
> - Version2 requires a slightly more complicated logical plan, with the
> addition of the table spools. However, its LogicalRepeatUnionV2
> operator is more "purpose-agnostic", and can be reused in the future
> for other plans that would require repetition. For instance, it could
> be used to implement a for loop: let us say that we need to build a
> plan where we must run a certain UserDefinedFunction ten times
> (because each time its results will vary depending on whatever
> factor). A possible way to implement that will be using
> LogicalRepeatUnionV2, with maxRep=10, an empty values as left input
> (no results actually needed from that side) and the
> UserDefinedFunction as right input:
>
> LogicalRepeatUnionV2(maxRep=10, all=true)
>   Values({})
>   UserDefinedFunction(...)
>
> We prefer Version2 that seems to be more useful in the long term, so
> we are ready to go on with this approach if there are no objections
> (keeping in mind that this is still an "experimental" feature).
>
> How does it look?
>
> Best regards,
> Ruben
>
>
>
>
>
> Le mar. 9 avr. 2019 à 01:07, Walaa Eldin Moustafa <[email protected]>
> a
> écrit :
>
> > Stamatis, here is some recent work we did on that topic:
> > http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf
> >
> > References section has more pointers to other related work.
> >
> > Thanks,
> > Walaa.
> >
> > On Sun, Apr 7, 2019 at 11:15 PM Stamatis Zampetakis <[email protected]>
> > wrote:
> > >
> > > Hello,
> > >
> > > @Julian: It looks like an interesting project, looking forward to see
> > what
> > > comes out of it.
> > >
> > > @Walaa: The current implementation takes care of UNION ALL but it seems
> > > relatively easy to extend it for UNION. At least with a naive
> > > implementation. Research-wise, I am not very familiar with the domain
> so
> > if
> > > you could share a few papers it would be nice :)
> > >
> > > @Ruben: Thanks for the changes and great work so far. I will have a
> look
> > > when I find some time.
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Fri, Apr 5, 2019 at 3:48 PM Ruben Q L <[email protected]> wrote:
> > >
> > > > Thanks for the feedback.
> > > > I have pushed <https://github.com/apache/calcite/pull/1020> the
> latest
> > > > modifications. As requested, I have flagged all new features as
> > > > "experimental".
> > > >
> > > > Best regards,
> > > > Ruben
> > > >
> > > >
> > > > Le jeu. 4 avr. 2019 à 20:29, Walaa Eldin Moustafa <
> > [email protected]>
> > > > a
> > > > écrit :
> > > >
> > > > > +1 to Julian.
> > > > >
> > > > > There are open questions on supporting UNION (instead of UNION
> ALL),
> > > > > aggregation and negation through recursion. Many of those topics
> > still
> > > > > have papers written to address their open questions until today. I
> > > > > think the PR is geared towards addressing the UNION ALL case, and
> > > > > extending the architecture to cover the more involved cases may be
> > > > > tricky if we commit to the UNION ALL friendly approach.
> > > > >
> > > > > Thanks,
> > > > > Walaa.
> > > > >
> > > > > On Thu, Apr 4, 2019 at 11:23 AM Julian Hyde <[email protected]>
> > wrote:
> > > > > >
> > > > > > I guess my concerns would all be met if we could flag this as
> > > > > “experimental”.
> > > > > >
> > > > > > There are more general forms of recursive / deductive queries
> that
> > I
> > > > > would like us to be able to tackle someday. If and when we do, I
> > don’t
> > > > want
> > > > > to be locked into the structures and nomenclature of this change.
> But
> > > > until
> > > > > then, this change is a huge step forward. If the APIs were labeled
> > > > > “experimental and subject to change without notice” then we can
> > evolve as
> > > > > we know more.
> > > > > >
> > > > > > By the way, I have personal project where I am experimenting with
> > > > > melding functional programming with relational algebra. If you have
> > > > > functions-as-values in relational algebra then you can add a
> > fixed-point
> > > > > operator and thereby create recursive/deductive queries. You can
> use
> > > > > higher-order functions such as map and filter on nested
> collections,
> > and
> > > > > exploit the fact that the relational operators (project, filter,
> > join)
> > > > are
> > > > > themselves higher-order functions. Rather than adding functional
> > > > extensions
> > > > > to a relational language, I started digging the tunnel from the
> other
> > > > end:
> > > > > I started with a small, elegant functional language (ML), wrote an
> > > > > interpreter in Java, and am extending the language for relational
> > > > algebra.
> > > > > If all goes well, there would be some extensions in Calcite’s
> > algebra for
> > > > > functions-as-values and relational expressions that are defined by
> > > > > (possibly recursive) functions.
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > [1] https://github.com/julianhyde/smlj <
> > > > > https://github.com/julianhyde/smlj>
> > > > > >
> > > > > >
> > > > > > > On Apr 4, 2019, at 9:31 AM, Stamatis Zampetakis <
> > [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > The issue has advanced quite a lot and I guess it will be
> > finalised
> > > > > soon.
> > > > > > > There is some ongoing discussion about a few
> naming/refactorings
> > in
> > > > > the PR
> > > > > > > [1] but these could be resolved easily.
> > > > > > >
> > > > > > > Ruben, had made a proposal which LGTM so I am inclined to merge
> > the
> > > > PR
> > > > > as
> > > > > > > soon as the last changes are made (and no other serious issues
> > > > appear).
> > > > > > >
> > > > > > > I guess it is the right time to jump in to the discusion,
> > especially
> > > > > if you
> > > > > > > strongly disagree on the general approach.
> > > > > > >
> > > > > > > [1] https://github.com/apache/calcite/pull/1020
> > > > > > >
> > > > > > > Best,
> > > > > > > Stamatis
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to