Tried with swapOuterJoins = true and got two failures in the calcite/core tests, which were both caused by switching left join to right join. And I think this is further because EnumerableJoin gives most cost to the left side (by mistake). So I went ahead to change "computeSelfCost()" in EnumerableJoin, and it turned out I got 10 failures this time (all due explain plan difference").
I'll file JIRAs for the above two issues respectively. But I would like to get around this quickly by adding a SWAP_OUTER static instance to JoinCommuteRule. Would that be OK? Thanks, Maryann On Mon, Mar 23, 2015 at 1:56 PM, Julian Hyde <[email protected]> wrote: > On Mar 20, 2015, at 3:02 PM, Maryann Xue <[email protected]> wrote: > > >>> I can't think of a good reason why JoinCommuteRule doesn't swap outer > > joins. > > > > But right now the only call to swap() is with swapOuterJoins set to > false. > > So I thought it might have some reason to do so. Can we change that? > > I don’t remember why. Can you investigate, by running the test suite, and > make a recommendation? > > > > >>> EnumerableJoin originally built the left, probed the right, and > > therefore had a smaller cost if the smaller input were on the left. > > > > Phoenix actually builds the right and probes the left. > > > >>> But we changed it, because the convention in the optimizer world is to > > build left-deep trees, with the largest input on the left, and smaller, > > hopefully selective, inputs on the right. > > > > So I assume EnumerableJoin now should give LHS a cheaper cost, right? It > > does not look like so in the code. > > Oops, you’re right. EnumerableJoin is more expensive if the larger input > is placed on the left. I think that is a mistake. > > > Don't know if my understanding is correct, but I think a left-deep tree > > with largest relation on the left would most likely benefit nested loop > > joins. Phoenix is not able to do NL join, so either a left-deep tree with > > largest on the right or, if memory limit allows, a right-deep tree with > > largest on the left is preferable. > > Although it would be nice if each join algorithm could choose its cost > model, I think it would make it a lot more complicated to build re-usable > rules. > > You should consider changing your join to match the convention. (And yes > we need to change EnumerableJoin also.) > > Julian > >
