Re: Correlate Join SemiJoin transformation

Yuzhao Chen Tue, 04 Jun 2019 20:09:40 -0700

I think the complexity mostly comes from the value generator, which is the 
engineering foundation of current decorrelation.


Best,
Danny Chan
在 2019年6月5日 +0800 AM10:45，Chunwei Lei <[email protected]>，写道：
> Thanks for raising this, Haisheng.
>
> Do you mean that we should have better subquery unnesting?
>
>
>
>
> Best,
> Chunwei
>
>
> On Tue, Jun 4, 2019 at 8:27 AM Haisheng Yuan <[email protected]> wrote:
>
> > Hi,
> >
> > Since we have commited CALCITE-2969, and in the next release of 1.21, we
> > will deprecate EquiJoin and make enumerable hash join support non-equi join
> > condition, the optimization door is open to us.
> >
> > Currently Calcite often generates complicated and inefficient plan for
> > boolean context correlated subquery (putting value context subquery aside,
> > which is more complex during unnesting). e.g.
> > https://issues.apache.org/jira/browse/CALCITE-2857
> > https://issues.apache.org/jira/browse/CALCITE-2874
> >
> > There are many more cases if you want to try and find out. Some underlying
> > reasons are:
> > 1. SemJoin doesn't support non-equi condition
> > 2. Correlate comes with Aggregate in many cases
> >
> > Sometimes, SubqueryRemoveRule generates Correlate with Aggregate,
> > sometimes generates Join with Aggregate. Then we use SemiJoinRule to match
> > Join(Rel, Agg) pattern to see whether we can transform it to semijoin or
> > not. This is counter intuitive, because we already know it will be a
> > SemiJoin or AntiSemi when we generate Correlate for the subquery. In
> > current way, we may miss the chance to get back SemiJoin for it, and we
> > almost lose the opportunity to do IndexScan if there is index available.
> >
> > In addition, I see 2 rules JoinAddRedundantSemiJoinRule and
> > SemiJoinRemoveRule seem to provide the chance to do index scan. But through
> > the test cases related with the 2 rules, I don't see any chance for a
> > better plan with them and they don't seem to be able to generate plan with
> > index scan. I doubt whether they are used at all. Can someone shed some
> > light on this and give us some examples, if I get it wrong?
> >
> > The transformation flow for boolean context sub-query in my head is:
> > sub-query -> Correate(Semi/AntiSemi) without Aggregate (SubqueryRemoveRule)
> > Correlate(Semi/AntiSemi) -> NestedLoopJoin (correlate implementation rule)
> > Correlate(Semi/AntiSemi) -> IndexedNestedLoopJoin (index apply rule)
> > Correlate(Semi/AntiSemi) -> SemiJoin/AntiSemiJoin (Semi Correlate to Semi
> > Join rule)
> > SemiJoin -> InnerJoin(rel, Agg) (SemiJoin to Join rule)
> > InnerJoin(rel, Agg) -> InnerJoin(Agg, rel) (Join reorder rule)
> > SemiJoin/InnerJoin -> HashJoin/MergeJoin/NLJ (Join implementation rule)
> >
> > This is a long shot, and will involve tons of changes.
> > Any thoughts?
> >
> > - Haisheng Yuan

Re: Correlate Join SemiJoin transformation

Reply via email to