As we know, "A “semi-join” between two tables returns rows from the first table 
where one or more matches are found in the second table. The difference between 
a semi-join and a conventional join is that rows in the first table will be 
returned at most once. Even if the second table contains two matches for a row 
in the first table, only one copy of the row will be returned."  (ref 
http://dbspecialists.com/speeding-queries-semi-joins-anti-joins-oracle-evaluates-exists-not-exists-not)


I wondered why must SemiJoin extend from EquiJoin and have only equi join 
condition in Calcite?


e.g.  select * from l where l.a in (select r.c from r where l.b > r.d)
According to the definition, the above SQL can be converted to semi-join. There 
is an equi join condition: l.a = r.c, which can be used as shuffle key on 
distributed environment.
However, it can not be converted to SemiJoin in Calcite now. Because there is a 
non-equi join conditions: lb. > r.d.





Reply via email to