Thanks Julian and Jesus for your feedback. Jesus> If I remember correctly, the rule pushes the Sort through the Join (if possible), but it also preserves the Sort on top of the Join to ensure correctness.
Indeed, I was mislead by the name of the rule and thought that the Sort is completely pushed below the join. Julian> Or perhaps we leave the Sort and have a rule that notices the output order of the Join and, based on that, weakens[1] or removes the Sort. There is a rule for removing the Sort if its input is already ordered (i.e., SortRemoveRule). Julian> And the rule could also be extended to deal with Sort operators that do not have a limit. Limits seriously constrain what the rule can safely do, and if there is no limit, we can safely push through inner join. In cases where there is a Sort with a limit can't we push only the sort and leave the limit as is if both cannot be pushed together? Ideally, what I want to achieve is push the sort down as much as possible in order to exploit an index with an interesting order and remove the sort if the join (and other intermediate operators are ordered in the same way). For the moment, I am relying a lot on the Enumerable operators so based on the discussion so far and by looking in the existing code, I guess what is missing for my use-case is: (i) to enrich the RelTraitSet of the Enumerable join variants to properly declare if they preserve the order of their input(s) (only EnumerableMergeJoin provides such information at the moment); (ii) extend the SortJoinTransposeRule to be able to handle more cases notably the INNER join which is very frequent; (iii) possibly add a rule that uses nested loops instead of hash join to perform equijoins (since queries with index scans and limit can be much faster if the join does not need to bring the whole relation in memory). Regarding (i) I have the impression (still needs to be verified) that: (a) the hash join implementation of Calcite (EnumerableDefaults#_join) retains the order of the outer/probing relation; (b) the nested loop join implementation of Calcite (EnumerableDefaults#correlateJoin) retains the order of the outer relation. If anybody else has already an idea of what exactly holds for the physical join algorithms of Calcite, please share it here. Best, Stamatis Στις Πέμ, 6 Σεπ 2018 στις 8:07 μ.μ., ο/η Julian Hyde <[email protected]> έγραψε: > I’d forgotten about LIMIT. > > The rule could be extended to push limit through inner join if there is a > foreign key (e.g. we know that the join is a lookup that does not > increase/decrease the number of rows). > > And the rule could also be extended to deal with Sort operators that do > not have a limit. Limits seriously constrain what the rule can safely do, > and if there is no limit, we can safely push through inner join. > > Julian > > > > On Sep 6, 2018, at 10:39 AM, Jesus Camacho Rodriguez < > [email protected]> wrote: > > > > The idea for that rule was to be able to exploit the limit/fetch spec of > the Sort operator to reduce the number of rows that needed to be joined, > that is why it was only applied to LEFT/RIGHT outer join. > > > > I think option 2 below sounds better than creating a new rule variant. > > > > Thanks, > > Jesús > > > > > > On 9/6/18, 10:28 AM, "Julian Hyde" <[email protected]> wrote: > > > > Ah, that makes sense. > > > > Reading the code, I couldn’t figure out why it applies to LEFT and > RIGHT but not to INNER. (For some kinds of join, for example inner merge > join, it could push the sort to both sides, as long as the sort was > compatible with what is needed to ensure that the keys arrive at the right > time.) > > > > If needed, we could have a variant of the rule that omits the Sort > after the Join. Or perhaps we leave the Sort and have a rule that notices > the output order of the Join and, based on that, weakens[1] or removes the > Sort. > > > > Julian > > > > [1] https://issues.apache.org/jira/browse/CALCITE-2540 < > https://issues.apache.org/jira/browse/CALCITE-2540> > > > > > > > >> On Sep 6, 2018, at 10:08 AM, Jesus Camacho Rodriguez < > [email protected]> wrote: > >> > >> If I remember correctly, the rule pushes the Sort through the Join (if > possible), but it also preserves the Sort on top of the Join to ensure > correctness. > >> > >> -Jesús > >> > >> > >> On 9/6/18, 9:57 AM, "Julian Hyde" <[email protected]> wrote: > >> > >> Yes, it depends very much on the operator. Some examples: > >> Merge join typically requires inputs to be sorted, and preserves that > order. (But some outer joins may throw in null values out of order.) > >> Map join typically preserves the order of the probing side, not the > build side. > >> Hash join typically destroys the order of both sides. > >> Use the rule with caution. > >> > >> Julian > >> > >> > >>> On Sep 6, 2018, at 9:33 AM, Stamatis Zampetakis <[email protected]> > wrote: > >>> > >>> Hello, > >>> > >>> I noticed that there is a Calcite rule (i.e., SortJoinTransposeRule) > that > >>> pushes a LogicalSort past a LogicalJoin if the join is either left > outer or > >>> right outer. > >>> > >>> Who guarantees that the left and right outer joins are preserving the > order > >>> of the inputs? > >>> Does the SQL standard requires that these types of joins are order > >>> preserving? > >>> > >>> Since we are working with logical operators, I would tend to think > that we > >>> cannot assume anything about the physical equivalent. > >>> > >>> Best, > >>> Stamatis > >> > >> > >> > > > > > > > >
