@Walaa, our idea would be to provide a boolean 'all' flag in the parent abstract class RepeatUnion, that will be inherited by the Logical operator and the Enumerable operator. For the moment, we just propose the implementation of RECURSIVE UNION ALL version, so in this initial commit the EnumerableRepeatUnion will support only 'all=true' (it would throw an UnsupportedOperationException on its implement method otherwise). But in the future, the 'all=false' could be implemented, in order to do so, the EnumerableRepeatUnion will have to "remember" every item that it has returned, and disregard duplicates when evaluating its left / right input. I think both versions 1 and 2 could eventually be enhanced to implement this.
Le jeu. 18 avr. 2019 à 16:36, Walaa Eldin Moustafa <[email protected]> a écrit : > Are you considering refactoring the UNION ALL approach? > > On Thu, Apr 18, 2019 at 6:04 AM Ruben Q L <[email protected]> wrote: > > > Hello, > > > > Stamatis and I have been discussing about this topic, and currently we > have > > two main approaches on how to represent the logical plan. > > > > Consider the following basic example: a SQL recursive union query that > > generates numbers 1,2,..10: > > > > WITH RECURSIVE t(n) AS ( > > VALUES (1) > > UNION ALL > > SELECT n+1 FROM t WHERE n < 10 > > ) > > SELECT * FROM t > > > > *Version 1* > > > > LogicalRepeatUnionV1(table="t", maxRep=-1, all=true) > > LogicalValues({1}) > > LogicalProject($0=$0+1) > > LogicalFilter($0<10) > > LogicalTableScan(table="t") > > > > The operator LogicalRepeatUnionV1 will be a pure translation of the > > SQL recursive union mechanism (it could be named LogicalRecursiveUnion > > indeed). > > Its behavior (as described e.g. here > > <https://www.postgresql.org/docs/9.1/queries-with.html>) will be: > > - Evaluate the left input once, place the results in a temporary > > working table ("t" in our example) > > - Evaluate the right input, saving its results in an intermediate > > location. When no more results are returned, the content of "t" will > > be replaced with the results from the intermediate location, and the > > right input will be re-evaluated again. This process will repeat over > > and over, until it produces no more results (or until an optional > > value maxRep is reached, the default value being -1, i.e. no limit). > > > > *Version 2* > > > > LogicalRepeatUnionV2 (maxRep=-1, all=true) > > LogicalTableSpool(table="t") > > LogicalValues({1}) > > LogicalTableSpool(table="t") > > LogicalProject($0=$0+1) > > LogicalFilter($0<10) > > LogicalTableScan(table="t") > > > > The operator LogicalRepeatUnionV2 will perform the following tasks: > > - Evaluate the left input once. > > - Evaluate the right input over and over until it produces no more > > results (or until an optional value maxRep is reached, the default > > value being -1, i.e. no limit). > > As we can see, LogicalRepeatUnionV2 will be in charge on the iteration > > only, it will not be responsible for storing the results for the next > > iteration that is needed for the SQL recursive union, this logic will > > be delegated to a LogicalTableSpool. > > > > Both versions will satisfy our requirements in order to implement the > > SQL recursive union, the main difference being: > > - Version1 will produce a simpler and more concise logical plan, but > > the LogicalRepeatUnionV1 operator will only fit this purpose and could > > not be reused in the future in other contexts. > > - Version2 requires a slightly more complicated logical plan, with the > > addition of the table spools. However, its LogicalRepeatUnionV2 > > operator is more "purpose-agnostic", and can be reused in the future > > for other plans that would require repetition. For instance, it could > > be used to implement a for loop: let us say that we need to build a > > plan where we must run a certain UserDefinedFunction ten times > > (because each time its results will vary depending on whatever > > factor). A possible way to implement that will be using > > LogicalRepeatUnionV2, with maxRep=10, an empty values as left input > > (no results actually needed from that side) and the > > UserDefinedFunction as right input: > > > > LogicalRepeatUnionV2(maxRep=10, all=true) > > Values({}) > > UserDefinedFunction(...) > > > > We prefer Version2 that seems to be more useful in the long term, so > > we are ready to go on with this approach if there are no objections > > (keeping in mind that this is still an "experimental" feature). > > > > How does it look? > > > > Best regards, > > Ruben > > > > > > > > > > > > Le mar. 9 avr. 2019 à 01:07, Walaa Eldin Moustafa <[email protected] > > > > a > > écrit : > > > > > Stamatis, here is some recent work we did on that topic: > > > http://www.sysnet.ucsd.edu/sysnet/miscpapers/datalog-icbd16.pdf > > > > > > References section has more pointers to other related work. > > > > > > Thanks, > > > Walaa. > > > > > > On Sun, Apr 7, 2019 at 11:15 PM Stamatis Zampetakis <[email protected] > > > > > wrote: > > > > > > > > Hello, > > > > > > > > @Julian: It looks like an interesting project, looking forward to see > > > what > > > > comes out of it. > > > > > > > > @Walaa: The current implementation takes care of UNION ALL but it > seems > > > > relatively easy to extend it for UNION. At least with a naive > > > > implementation. Research-wise, I am not very familiar with the domain > > so > > > if > > > > you could share a few papers it would be nice :) > > > > > > > > @Ruben: Thanks for the changes and great work so far. I will have a > > look > > > > when I find some time. > > > > > > > > Best, > > > > Stamatis > > > > > > > > On Fri, Apr 5, 2019 at 3:48 PM Ruben Q L <[email protected]> wrote: > > > > > > > > > Thanks for the feedback. > > > > > I have pushed <https://github.com/apache/calcite/pull/1020> the > > latest > > > > > modifications. As requested, I have flagged all new features as > > > > > "experimental". > > > > > > > > > > Best regards, > > > > > Ruben > > > > > > > > > > > > > > > Le jeu. 4 avr. 2019 à 20:29, Walaa Eldin Moustafa < > > > [email protected]> > > > > > a > > > > > écrit : > > > > > > > > > > > +1 to Julian. > > > > > > > > > > > > There are open questions on supporting UNION (instead of UNION > > ALL), > > > > > > aggregation and negation through recursion. Many of those topics > > > still > > > > > > have papers written to address their open questions until today. > I > > > > > > think the PR is geared towards addressing the UNION ALL case, and > > > > > > extending the architecture to cover the more involved cases may > be > > > > > > tricky if we commit to the UNION ALL friendly approach. > > > > > > > > > > > > Thanks, > > > > > > Walaa. > > > > > > > > > > > > On Thu, Apr 4, 2019 at 11:23 AM Julian Hyde <[email protected]> > > > wrote: > > > > > > > > > > > > > > I guess my concerns would all be met if we could flag this as > > > > > > “experimental”. > > > > > > > > > > > > > > There are more general forms of recursive / deductive queries > > that > > > I > > > > > > would like us to be able to tackle someday. If and when we do, I > > > don’t > > > > > want > > > > > > to be locked into the structures and nomenclature of this change. > > But > > > > > until > > > > > > then, this change is a huge step forward. If the APIs were > labeled > > > > > > “experimental and subject to change without notice” then we can > > > evolve as > > > > > > we know more. > > > > > > > > > > > > > > By the way, I have personal project where I am experimenting > with > > > > > > melding functional programming with relational algebra. If you > have > > > > > > functions-as-values in relational algebra then you can add a > > > fixed-point > > > > > > operator and thereby create recursive/deductive queries. You can > > use > > > > > > higher-order functions such as map and filter on nested > > collections, > > > and > > > > > > exploit the fact that the relational operators (project, filter, > > > join) > > > > > are > > > > > > themselves higher-order functions. Rather than adding functional > > > > > extensions > > > > > > to a relational language, I started digging the tunnel from the > > other > > > > > end: > > > > > > I started with a small, elegant functional language (ML), wrote > an > > > > > > interpreter in Java, and am extending the language for relational > > > > > algebra. > > > > > > If all goes well, there would be some extensions in Calcite’s > > > algebra for > > > > > > functions-as-values and relational expressions that are defined > by > > > > > > (possibly recursive) functions. > > > > > > > > > > > > > > Julian > > > > > > > > > > > > > > [1] https://github.com/julianhyde/smlj < > > > > > > https://github.com/julianhyde/smlj> > > > > > > > > > > > > > > > > > > > > > > On Apr 4, 2019, at 9:31 AM, Stamatis Zampetakis < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > The issue has advanced quite a lot and I guess it will be > > > finalised > > > > > > soon. > > > > > > > > There is some ongoing discussion about a few > > naming/refactorings > > > in > > > > > > the PR > > > > > > > > [1] but these could be resolved easily. > > > > > > > > > > > > > > > > Ruben, had made a proposal which LGTM so I am inclined to > merge > > > the > > > > > PR > > > > > > as > > > > > > > > soon as the last changes are made (and no other serious > issues > > > > > appear). > > > > > > > > > > > > > > > > I guess it is the right time to jump in to the discusion, > > > especially > > > > > > if you > > > > > > > > strongly disagree on the general approach. > > > > > > > > > > > > > > > > [1] https://github.com/apache/calcite/pull/1020 > > > > > > > > > > > > > > > > Best, > > > > > > > > Stamatis > > > > > > > > > > > > > > > > > > > > > > > >
