Hi, Since we have commited CALCITE-2969, and in the next release of 1.21, we will deprecate EquiJoin and make enumerable hash join support non-equi join condition, the optimization door is open to us.
Currently Calcite often generates complicated and inefficient plan for boolean context correlated subquery (putting value context subquery aside, which is more complex during unnesting). e.g. https://issues.apache.org/jira/browse/CALCITE-2857 https://issues.apache.org/jira/browse/CALCITE-2874 There are many more cases if you want to try and find out. Some underlying reasons are: 1. SemJoin doesn't support non-equi condition 2. Correlate comes with Aggregate in many cases Sometimes, SubqueryRemoveRule generates Correlate with Aggregate, sometimes generates Join with Aggregate. Then we use SemiJoinRule to match Join(Rel, Agg) pattern to see whether we can transform it to semijoin or not. This is counter intuitive, because we already know it will be a SemiJoin or AntiSemi when we generate Correlate for the subquery. In current way, we may miss the chance to get back SemiJoin for it, and we almost lose the opportunity to do IndexScan if there is index available. In addition, I see 2 rules JoinAddRedundantSemiJoinRule and SemiJoinRemoveRule seem to provide the chance to do index scan. But through the test cases related with the 2 rules, I don't see any chance for a better plan with them and they don't seem to be able to generate plan with index scan. I doubt whether they are used at all. Can someone shed some light on this and give us some examples, if I get it wrong? The transformation flow for boolean context sub-query in my head is: sub-query -> Correate(Semi/AntiSemi) without Aggregate (SubqueryRemoveRule) Correlate(Semi/AntiSemi) -> NestedLoopJoin (correlate implementation rule) Correlate(Semi/AntiSemi) -> IndexedNestedLoopJoin (index apply rule) Correlate(Semi/AntiSemi) -> SemiJoin/AntiSemiJoin (Semi Correlate to Semi Join rule) SemiJoin -> InnerJoin(rel, Agg) (SemiJoin to Join rule) InnerJoin(rel, Agg) -> InnerJoin(Agg, rel) (Join reorder rule) SemiJoin/InnerJoin -> HashJoin/MergeJoin/NLJ (Join implementation rule) This is a long shot, and will involve tons of changes. Any thoughts? - Haisheng Yuan
