How about making a sub-query type (in RexSubQuery), so it is gone before we reach algebra.
ANTI_NOTIN is a terrible name. ANTI means 'opposite' to ANTI_NOTIN is the opposite of NOT IN?! On Mon, Jul 20, 2020 at 1:00 PM Haisheng Yuan <[email protected]> wrote: > > Typo: > We can just add a security guard saying that it is supported. > Should be > We can just add a security guard saying that it is NOT supported. > > On 2020/07/20 19:57:34, Haisheng Yuan <[email protected]> wrote: > > I am not sure I got your implication by "pollute". If you mean changes, > > yes, it requires some changes in rules. Do we need to change enumerables? > > Not necessary. We can just add a security guard saying that it is > > supported. Not everyone requires the Enumerable operators to support > > everything. More importantly, currently there is no logic or rules to > > translate sub-query directly to SEMI/ANTI joins, let alone translating > > directly to ANTI_NOTIN. Currently NOT IN is expanded to NOT(IN ...) before > > entering RelNode land. That means we don't even have the chance to generate > > the NOT IN anti join. Is that still a concern? > > > > Even if some day, some contributor extends Calcite's parser and > > SubqueryRemovalRule to be able to transform NOT_IN subquery into NOT IN > > anti join, we still have chance to disable it. Is that still a concern? > > > > There are many ways to play it safe. > > > > > Brainstorming: maybe we could consider it as a separate logical operator > > > (with its corresponding enumerable implementation)? > > It doesn't sound cool. It requires much more work. You have to duplicate > > all the rules, metadata handler that deal with LogicalJoin, and for some > > rule that matches Join base class, you have to check it is a LogicalJoin or > > the logical operator for ANTI_NOTIN. > > > > On 2020/07/20 08:28:42, Ruben Q L <[email protected]> wrote: > > > I have some concerns that this new type would "pollute" the existing Join > > > logic, rules and enumerable implementations. > > > > > > Brainstorming: maybe we could consider it as a separate logical operator > > > (with its corresponding enumerable implementation)? > > > > > > > > > Le lun. 20 juil. 2020 à 06:08, Haisheng Yuan <[email protected]> a > > > écrit : > > > > > > > I agree that NOT IN is toxic, and it is error-prone. > > > > But you can't prevent people writing SQL with not in sub-queries, would > > > > you rather let optimizer generate inefficient plan? > > > > > > > > - Haisheng > > > > > > > > ------------------------------------------------------------------ > > > > 发件人:Julian Hyde<[email protected]> > > > > 日 期:2020年07月20日 11:56:35 > > > > 收件人:[email protected]<[email protected]> > > > > 主 题:Re: [DISCUSS] New Join Type: ANTI_NOTIN > > > > > > > > Yuck! > > > > > > > > NOT IN is toxic. I'd rather keep it out of the algebra. > > > > > > > > On Sun, Jul 19, 2020 at 8:24 PM Haisheng Yuan <[email protected]> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > Currently, JoinRelType.ANTI only represents NOT_EXISTS subquery > > > > > (thanks > > > > to Ruben for reminding). > > > > > For some simple boolean context NOT_IN subquery, we can't transform it > > > > to ANTI join. e.g.: > > > > > > > > > > SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is > > > > nullable > > > > > > > > > > Because if there is a null value in the results of subquery, the NOT > > > > > IN > > > > predicate will return false, the whole query returns empty. And in > > > > Calcite, > > > > the plan for this kind of query is inefficient. > > > > > > > > > > If we have ANTI_NOTIN to represent this kind of join, we can generate > > > > more efficient plan, as long as the query executor support it. > > > > > > > > > > Thoughts? > > > > > > > > > > Haisheng Yuan > > > > > > > > > > > > > > > > > > >
