I think the complexity mostly comes from the value generator, which is the engineering foundation of current decorrelation.
Best, Danny Chan 在 2019年6月5日 +0800 AM10:45,Chunwei Lei <[email protected]>,写道: > Thanks for raising this, Haisheng. > > Do you mean that we should have better subquery unnesting? > > > > > Best, > Chunwei > > > On Tue, Jun 4, 2019 at 8:27 AM Haisheng Yuan <[email protected]> wrote: > > > Hi, > > > > Since we have commited CALCITE-2969, and in the next release of 1.21, we > > will deprecate EquiJoin and make enumerable hash join support non-equi join > > condition, the optimization door is open to us. > > > > Currently Calcite often generates complicated and inefficient plan for > > boolean context correlated subquery (putting value context subquery aside, > > which is more complex during unnesting). e.g. > > https://issues.apache.org/jira/browse/CALCITE-2857 > > https://issues.apache.org/jira/browse/CALCITE-2874 > > > > There are many more cases if you want to try and find out. Some underlying > > reasons are: > > 1. SemJoin doesn't support non-equi condition > > 2. Correlate comes with Aggregate in many cases > > > > Sometimes, SubqueryRemoveRule generates Correlate with Aggregate, > > sometimes generates Join with Aggregate. Then we use SemiJoinRule to match > > Join(Rel, Agg) pattern to see whether we can transform it to semijoin or > > not. This is counter intuitive, because we already know it will be a > > SemiJoin or AntiSemi when we generate Correlate for the subquery. In > > current way, we may miss the chance to get back SemiJoin for it, and we > > almost lose the opportunity to do IndexScan if there is index available. > > > > In addition, I see 2 rules JoinAddRedundantSemiJoinRule and > > SemiJoinRemoveRule seem to provide the chance to do index scan. But through > > the test cases related with the 2 rules, I don't see any chance for a > > better plan with them and they don't seem to be able to generate plan with > > index scan. I doubt whether they are used at all. Can someone shed some > > light on this and give us some examples, if I get it wrong? > > > > The transformation flow for boolean context sub-query in my head is: > > sub-query -> Correate(Semi/AntiSemi) without Aggregate (SubqueryRemoveRule) > > Correlate(Semi/AntiSemi) -> NestedLoopJoin (correlate implementation rule) > > Correlate(Semi/AntiSemi) -> IndexedNestedLoopJoin (index apply rule) > > Correlate(Semi/AntiSemi) -> SemiJoin/AntiSemiJoin (Semi Correlate to Semi > > Join rule) > > SemiJoin -> InnerJoin(rel, Agg) (SemiJoin to Join rule) > > InnerJoin(rel, Agg) -> InnerJoin(Agg, rel) (Join reorder rule) > > SemiJoin/InnerJoin -> HashJoin/MergeJoin/NLJ (Join implementation rule) > > > > This is a long shot, and will involve tons of changes. > > Any thoughts? > > > > - Haisheng Yuan
