[ 
https://issues.apache.org/jira/browse/FLINK-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679107#comment-14679107
 ] 

Fabian Hueske commented on FLINK-2107:
--------------------------------------

[~Zentol] is right. This is an optimization to avoid copying the probe side 
record if there is only one build side record. 1-n joins where the build-side 
contains only unique keys are quite common. That is why this optimization can 
make a difference.

The probe side records need to be copied, because the user-defined join 
function can modify all incoming records. If we would not create a new copy for 
each join function call, the second call of the join function might happen with 
a probe side record that was modified by the first call of the join function 
which violates the assumption of independent function calls and produces wrong 
results.

> Implement Hash Outer Join algorithm
> -----------------------------------
>
>                 Key: FLINK-2107
>                 URL: https://issues.apache.org/jira/browse/FLINK-2107
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Fabian Hueske
>            Assignee: Chiwan Park
>            Priority: Minor
>             Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a hash outer join algorithm that can cover 
> left and right outer joins.
> The implementation can be based on the regular hash join iterators (for 
> example `ReusingBuildFirstHashMatchIterator` and 
> `NonReusingBuildFirstHashMatchIterator`, see also `MatchDriver` class)
> The Reusing and NonReusing variants differ in whether object instances are 
> reused or new objects are created. I would start with the NonReusing variant 
> which is safer from a user's point of view and should also be easier to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to