[ https://issues.apache.org/jira/browse/FLINK-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662083#comment-14662083 ]
Chesnay Schepler commented on FLINK-2107: ----------------------------------------- looks like an optimization thing to me. you could probably replace the whole block from L116 to L138 with {code:java} while (running && ((nextBuildSideRecord = buildSideIterator.next()) != null)) { probeCopy = this.probeSideSerializer.copy(probeRecord); matchFunction.join(nextBuildSideRecord, probeCopy, collector); } {code} but this would mean that you would always create a copy, even if there is only a single match, which is what the following bit checks for. {code:java} if ((tmpRec = buildSideIterator.next()) != null) { {code} if this is true we have accessed two build-side values without calling join, and as such have to deal with them outside the loop. > Implement Hash Outer Join algorithm > ----------------------------------- > > Key: FLINK-2107 > URL: https://issues.apache.org/jira/browse/FLINK-2107 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime > Reporter: Fabian Hueske > Assignee: Chiwan Park > Priority: Minor > Fix For: pre-apache > > > Flink does not natively support outer joins at the moment. > This issue proposes to implement a hash outer join algorithm that can cover > left and right outer joins. > The implementation can be based on the regular hash join iterators (for > example `ReusingBuildFirstHashMatchIterator` and > `NonReusingBuildFirstHashMatchIterator`, see also `MatchDriver` class) > The Reusing and NonReusing variants differ in whether object instances are > reused or new objects are created. I would start with the NonReusing variant > which is safer from a user's point of view and should also be easier to > implement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)