[
https://issues.apache.org/jira/browse/FLINK-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679107#comment-14679107
]
Fabian Hueske commented on FLINK-2107:
--------------------------------------
[~Zentol] is right. This is an optimization to avoid copying the probe side
record if there is only one build side record. 1-n joins where the build-side
contains only unique keys are quite common. That is why this optimization can
make a difference.
The probe side records need to be copied, because the user-defined join
function can modify all incoming records. If we would not create a new copy for
each join function call, the second call of the join function might happen with
a probe side record that was modified by the first call of the join function
which violates the assumption of independent function calls and produces wrong
results.
> Implement Hash Outer Join algorithm
> -----------------------------------
>
> Key: FLINK-2107
> URL: https://issues.apache.org/jira/browse/FLINK-2107
> Project: Flink
> Issue Type: Sub-task
> Components: Local Runtime
> Reporter: Fabian Hueske
> Assignee: Chiwan Park
> Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a hash outer join algorithm that can cover
> left and right outer joins.
> The implementation can be based on the regular hash join iterators (for
> example `ReusingBuildFirstHashMatchIterator` and
> `NonReusingBuildFirstHashMatchIterator`, see also `MatchDriver` class)
> The Reusing and NonReusing variants differ in whether object instances are
> reused or new objects are created. I would start with the NonReusing variant
> which is safer from a user's point of view and should also be easier to
> implement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)