>Filter(Scan(emp), gender = ‘F’) is the “build” side, and you can imagine
implementing it by populating a hash-table with distinct keys before
starting to read from the “dept” table.

There is more than one way to skin a hash semi-join.

Imagine the following physical plan:
1) Scan(dept) first and build Map<deptno, List<dept.row>>
2) Perform Filter(Scan(emp), gender = ‘F’) and check if the join key is in
memory. If found, remove the entry from the map and return rows. If not
found, just ignore the row.

This can be more efficient in case dept is smaller than Filter(Scan(emp),
gender = ‘F’).

I suggest the following:
1) Let the right side of the semi join be the one that is semi-joined (as
in your example above)
2) "build" side (in your terms above) is more of a physical plan, thus it
is irrelevant at the logical level (see below). For example, for enumerable
convention we could have EnumerableHashJoinSemi (builds table over left
relation) and EnumerableHashJoinSemiRight (builds table over right
relation).

Vladimir

Reply via email to