I am not sure I got your implication by "pollute". If you mean changes, yes, it requires some changes in rules. Do we need to change enumerables? Not necessary. We can just add a security guard saying that it is supported. Not everyone requires the Enumerable operators to support everything. More importantly, currently there is no logic or rules to translate sub-query directly to SEMI/ANTI joins, let alone translating directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before entering RelNode land. That means we don't even have the chance to generate the NOT IN anti join. Is that still a concern?
Even if some day, some contributor extends Calcite's parser and SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN anti join, we still have chance to disable it. Is that still a concern? There are many ways to play it safe. > Brainstorming: maybe we could consider it as a separate logical operator > (with its corresponding enumerable implementation)? It doesn't sound cool. It requires much more work. You have to duplicate all the rules, metadata handler that deal with LogicalJoin, and for some rule that matches Join base class, you have to check it is a LogicalJoin or the logical operator for ANTI_NOTIN. On 2020/07/20 08:28:42, Ruben Q L <[email protected]> wrote: > I have some concerns that this new type would "pollute" the existing Join > logic, rules and enumerable implementations. > > Brainstorming: maybe we could consider it as a separate logical operator > (with its corresponding enumerable implementation)? > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan <[email protected]> a > écrit : > > > I agree that NOT IN is toxic, and it is error-prone. > > But you can't prevent people writing SQL with not in sub-queries, would > > you rather let optimizer generate inefficient plan? > > > > - Haisheng > > > > ------------------------------------------------------------------ > > 发件人:Julian Hyde<[email protected]> > > 日 期:2020年07月20日 11:56:35 > > 收件人:[email protected]<[email protected]> > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > Yuck! > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan <[email protected]> wrote: > > > > > > Hi all, > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery (thanks > > to Ruben for reminding). > > > For some simple boolean context NOT_IN subquery, we can't transform it > > to ANTI join. e.g.: > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > > nullable > > > > > > Because if there is a null value in the results of subquery, the NOT IN > > predicate will return false, the whole query returns empty. And in Calcite, > > the plan for this kind of query is inefficient. > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can generate > > more efficient plan, as long as the query executor support it. > > > > > > Thoughts? > > > > > > Haisheng Yuan > > > > > > > > >
