[jira] [Commented] (FLINK-2107) Implement Hash Outer Join algorithm

Chesnay Schepler (JIRA) Fri, 07 Aug 2015 10:01:34 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662083#comment-14662083
 ]


Chesnay Schepler commented on FLINK-2107:
-----------------------------------------

looks like an optimization thing to me. you could probably replace the whole 
block from L116 to L138 with
{code:java}
while (running && ((nextBuildSideRecord = buildSideIterator.next()) != null)) {
        probeCopy = this.probeSideSerializer.copy(probeRecord);
        matchFunction.join(nextBuildSideRecord, probeCopy, collector);
}
{code}

but this would mean that you would always create a copy, even if there is only 
a single match, which is what the following bit checks for.
{code:java}
if ((tmpRec = buildSideIterator.next()) != null) {
{code}

if this is true we have accessed two build-side values without calling join, 
and as such have to deal with them outside the loop.

> Implement Hash Outer Join algorithm
> -----------------------------------
>
>                 Key: FLINK-2107
>                 URL: https://issues.apache.org/jira/browse/FLINK-2107
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Fabian Hueske
>            Assignee: Chiwan Park
>            Priority: Minor
>             Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a hash outer join algorithm that can cover 
> left and right outer joins.
> The implementation can be based on the regular hash join iterators (for 
> example `ReusingBuildFirstHashMatchIterator` and 
> `NonReusingBuildFirstHashMatchIterator`, see also `MatchDriver` class)
> The Reusing and NonReusing variants differ in whether object instances are 
> reused or new objects are created. I would start with the NonReusing variant 
> which is safer from a user's point of view and should also be easier to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2107) Implement Hash Outer Join algorithm

Reply via email to