[
https://issues.apache.org/jira/browse/FLINK-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662083#comment-14662083
]
Chesnay Schepler commented on FLINK-2107:
-----------------------------------------
looks like an optimization thing to me. you could probably replace the whole
block from L116 to L138 with
{code:java}
while (running && ((nextBuildSideRecord = buildSideIterator.next()) != null)) {
probeCopy = this.probeSideSerializer.copy(probeRecord);
matchFunction.join(nextBuildSideRecord, probeCopy, collector);
}
{code}
but this would mean that you would always create a copy, even if there is only
a single match, which is what the following bit checks for.
{code:java}
if ((tmpRec = buildSideIterator.next()) != null) {
{code}
if this is true we have accessed two build-side values without calling join,
and as such have to deal with them outside the loop.
> Implement Hash Outer Join algorithm
> -----------------------------------
>
> Key: FLINK-2107
> URL: https://issues.apache.org/jira/browse/FLINK-2107
> Project: Flink
> Issue Type: Sub-task
> Components: Local Runtime
> Reporter: Fabian Hueske
> Assignee: Chiwan Park
> Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a hash outer join algorithm that can cover
> left and right outer joins.
> The implementation can be based on the regular hash join iterators (for
> example `ReusingBuildFirstHashMatchIterator` and
> `NonReusingBuildFirstHashMatchIterator`, see also `MatchDriver` class)
> The Reusing and NonReusing variants differ in whether object instances are
> reused or new objects are created. I would start with the NonReusing variant
> which is safer from a user's point of view and should also be easier to
> implement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)