Tried with swapOuterJoins = true and got two failures in the calcite/core
tests, which were both caused by switching left join to right join. And I
think this is further because EnumerableJoin gives most cost to the left
side (by mistake). So I went ahead to change "computeSelfCost()" in
EnumerableJoin, and it turned out I got 10 failures this time (all due
explain plan difference").

I'll file JIRAs for the above two issues respectively. But I would like to
get around this quickly by adding a SWAP_OUTER static instance to
JoinCommuteRule. Would that be OK?


Thanks,
Maryann


On Mon, Mar 23, 2015 at 1:56 PM, Julian Hyde <[email protected]> wrote:

> On Mar 20, 2015, at 3:02 PM, Maryann Xue <[email protected]> wrote:
>
> >>> I can't think of a good reason why JoinCommuteRule doesn't swap outer
> > joins.
> >
> > But right now the only call to swap() is with swapOuterJoins set to
> false.
> > So I thought it might have some reason to do so. Can we change that?
>
> I don’t remember why. Can you investigate, by running the test suite, and
> make a recommendation?
>
> >
> >>> EnumerableJoin originally built the left, probed the right, and
> > therefore had a smaller cost if the smaller input were on the left.
> >
> > Phoenix actually builds the right and probes the left.
> >
> >>> But we changed it, because the convention in the optimizer world is to
> > build left-deep trees, with the largest input on the left, and smaller,
> > hopefully selective, inputs on the right.
> >
> > So I assume EnumerableJoin now should give LHS a cheaper cost, right? It
> > does not look like so in the code.
>
> Oops, you’re right. EnumerableJoin is more expensive if the larger input
> is placed on the left. I think that is a mistake.
>
> > Don't know if my understanding is correct, but I think a left-deep tree
> > with largest relation on the left would most likely benefit nested loop
> > joins. Phoenix is not able to do NL join, so either a left-deep tree with
> > largest on the right or, if memory limit allows, a right-deep tree with
> > largest on the left is preferable.
>
> Although it would be nice if each join algorithm could choose its cost
> model, I think it would make it a lot more complicated to build re-usable
> rules.
>
> You should consider changing your join to match the convention. (And yes
> we need to change EnumerableJoin also.)
>
> Julian
>
>

Reply via email to