I think they might be orthogonal. It is all about sub-query.
On 2020/07/21 05:48:54, Danny Chan <[email protected]> wrote: > If it is only constant NOT IN predicate, how difficult it is to rewrite it > into a normal composite AND predicate before entering the planning phrase ? > > Best, > Danny Chan > 在 2020年7月21日 +0800 PM12:35,Haisheng Yuan <[email protected]>,写道: > > Thanks Jinpeng for providing a good example for not in subquery. > > > > I 100% agree with you that correlated query won't be represented by > > ANTI_NOTIN join type, and it is not the proposal's intention. Here what we > > are discussing is not to use ANTI_NOTIN to represent all the NOT IN > > sub-queries, that is impossible. Instead, if you take a close look at the > > example query, it is a simple uncorrelated NOT IN sub-query. That is the > > target. Let's focus on that kind of query, ask ourselves this question: Can > > such a simple query be transformed into a ANTI join to make the plan > > efficient? > > > > Sadly no. The reality is that this kind of query is not uncommon, may be > > much more common than correlated NOT IN sub-queries. > > > > > > Reply to Julian: > > > > How about making a sub-query type (in RexSubQuery), so it is gone > > > > before we reach algebra. > > It will be nice to have a NOT_IN subquery type, without expanding NOT IN to > > NOT(IN....). > > However, if there is no ANTI_NOTIN in the join type (without reaching > > algebra), does that mean the optimizer still can't generate efficient plan > > for simple NOT IN sub-queries? > > > > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > > > the opposite of NOT IN?! > > It depends how people interpret ANTI. You interpret it as "opposite", I > > interpret it as "ANTI JOIN", means "anti join for NOT IN, instead of NOT > > EXISTS". But it is just a naming issue, I am OK to change it whatever name > > that makes sense to the community, as long as it can convey the meaning. > > > > Thanks, > > Haisheng > > > > On 2020/07/21 03:02:20, Jinpeng Wu <[email protected]> wrote: > > > Hi. > > > > > > In some SQL engine, the query > > > select * from A where c1 not in ( select c1 from B where B.c2 = A.c2); > > > is transformed to a plan like > > > select * from A LEFT ANTI JOIN B on A.c2 = B.c2 AND (A.c1 = B.c1 OR A.c1 > > > is > > > null OR B.c1 is null); > > > > > > Here, the "LEFT ANTI JOIN" is nothing more than traditional definition. > > > One > > > thing seems to be a problem is that A.c1 cannot be used as a join key in > > > the new plan. However, the problem is also there for ANTI_NOTIN, and even > > > other NOT-IN-SUBQUERY physical implementations. > > > > > > Thanks, > > > Qiupeng > > > > > > On Tue, Jul 21, 2020 at 5:30 AM Julian Hyde <[email protected]> wrote: > > > > > > > How about making a sub-query type (in RexSubQuery), so it is gone > > > > before we reach algebra. > > > > > > > > ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is > > > > the opposite of NOT IN?! > > > > > > > > On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan <[email protected]> wrote: > > > > > > > > > > Typo: > > > > > We can just add a security guard saying that it is supported. > > > > > Should be > > > > > We can just add a security guard saying that it is NOT supported. > > > > > > > > > > On 2020/07/20 19:57:34, Haisheng Yuan <[email protected]> wrote: > > > > > > I am not sure I got your implication by "pollute". If you mean > > > > changes, yes, it requires some changes in rules. Do we need to change > > > > enumerables? Not necessary. We can just add a security guard saying > > > > that it > > > > is supported. Not everyone requires the Enumerable operators to support > > > > everything. More importantly, currently there is no logic or rules to > > > > translate sub-query directly to SEMI/ANTI joins, let alone translating > > > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) > > > > before > > > > entering RelNode land. That means we don't even have the chance to > > > > generate > > > > the NOT IN anti join. Is that still a concern? > > > > > > > > > > > > Even if some day, some contributor extends Calcite's parser and > > > > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN > > > > anti join, we still have chance to disable it. Is that still a concern? > > > > > > > > > > > > There are many ways to play it safe. > > > > > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > > > operator > > > > > > > (with its corresponding enumerable implementation)? > > > > > > It doesn't sound cool. It requires much more work. You have to > > > > duplicate all the rules, metadata handler that deal with LogicalJoin, > > > > and > > > > for some rule that matches Join base class, you have to check it is a > > > > LogicalJoin or the logical operator for ANTI_NOTIN. > > > > > > > > > > > > On 2020/07/20 08:28:42, Ruben Q L <[email protected]> wrote: > > > > > > > I have some concerns that this new type would "pollute" the > > > > > > > existing > > > > Join > > > > > > > logic, rules and enumerable implementations. > > > > > > > > > > > > > > Brainstorming: maybe we could consider it as a separate logical > > > > operator > > > > > > > (with its corresponding enumerable implementation)? > > > > > > > > > > > > > > > > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan > > > > > > > <[email protected]> > > > > a > > > > > > > écrit : > > > > > > > > > > > > > > > I agree that NOT IN is toxic, and it is error-prone. > > > > > > > > But you can't prevent people writing SQL with not in > > > > > > > > sub-queries, > > > > would > > > > > > > > you rather let optimizer generate inefficient plan? > > > > > > > > > > > > > > > > - Haisheng > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > > > > 发件人:Julian Hyde<[email protected]> > > > > > > > > 日 期:2020年07月20日 11:56:35 > > > > > > > > 收件人:[email protected]<[email protected]> > > > > > > > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > > > > > > > > > > > > > Yuck! > > > > > > > > > > > > > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > > > > > > > > > > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan <[email protected]> > > > > wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS > > > > > > > > > subquery > > > > (thanks > > > > > > > > to Ruben for reminding). > > > > > > > > > For some simple boolean context NOT_IN subquery, we can't > > > > transform it > > > > > > > > to ANTI join. e.g.: > > > > > > > > > > > > > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- > > > > > > > > > bar.b is > > > > > > > > nullable > > > > > > > > > > > > > > > > > > Because if there is a null value in the results of subquery, > > > > > > > > > the > > > > NOT IN > > > > > > > > predicate will return false, the whole query returns empty. And > > > > > > > > in > > > > Calcite, > > > > > > > > the plan for this kind of query is inefficient. > > > > > > > > > > > > > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can > > > > generate > > > > > > > > more efficient plan, as long as the query executor support it. > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > Haisheng Yuan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
