Ruben Q L created CALCITE-3829:
----------------------------------
Summary: MergeJoinEnumerator should not use inputs enumerators
until it is really required
Key: CALCITE-3829
URL: https://issues.apache.org/jira/browse/CALCITE-3829
Project: Calcite
Issue Type: Improvement
Components: core
Affects Versions: 1.21.0
Reporter: Ruben Q L
Assignee: Ruben Q L
{{EnumerableDefaults#MergeJoinEnumerator}} provides an {{Enumerator}} that
performs a merge join between two sorted inputs. This sort operation can be
potentially very expensive, so we should skip it if possible. Right now, merge
join inputs' enumerators are created when {{MergeJoinEnumerator}} is
constructed; however, there are some cases where we can skip the enumerator
creation of one input: if the outer (i.e. left) enumerator returns no results,
there is no need to access (and sort) the inner (i.e. right) enumerator. For
this reason, we should delay the inner enumerator creation until the moment we
are sure it is really required: when the first element of the outer enumerator
is fetched. This strategy is already in place in other join algorithms in
{{EnumerableDefaults}} (e.g. {{nestedLoopJoinOptimized}}, {{semiEquiJoin}}),
and it will be quite easy to apply on {{MergeJoinEnumerator}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)