Github user zellerh commented on a diff in the pull request:
https://github.com/apache/trafodion/pull/1530#discussion_r182911028
--- Diff: core/sql/optimizer/NormRelExpr.cpp ---
@@ -2767,25 +2767,26 @@ Here t2.a is a unique key of table t2.
The following transformation is made
Semi Join {pred : t1.b = t2.a} Join {pred : t1.b = t2.a}
/ \ -------> / \
- / \ / \
-Scan t1 Scan t2 Scan t1 Scan t2
+ / \ / \
+ Scan t1 Scan t2 Scan t1 Scan t2
b) If the right child is not unique in the joining column then
we transform the semijoin into an inner join followed by a groupby
as the join's right child. This transformation is enabled by default
-only if the right side is an IN list, otherwise a CQD has to be used.
+only if the right side is an IN list or if the groupby's reduction
+ratio is greater than 5.0, otherwise a CQD has to be used.
select t1.a
from t1
where t1.b in (1,2,3,4,...,101) ;
- Semi Join {pred : t1.b = t2.a} Join {pred : t1.b = InList.col}
+ Semi Join {pred : t1.b = InList.col} Join {pred : t1.b = InList.col}
/ \ -------> / \
/ \ / \
-Scan t1 Scan t2 Scan t1 GroupBy {group cols:
InList.col}
+Scan t1 TupleList Scan t1 GroupBy {group cols:
InList.col}
|
--- End diff --
Nice to make the picture consistent, but from the code it looks like we do
this for things other than TupleList, so maybe "Scan t2" or "Q2" would be a
better name for the child?
---