[
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631756#comment-14631756
]
ASF GitHub Bot commented on FLINK-2105:
---------------------------------------
Github user chiwanpark commented on the pull request:
https://github.com/apache/flink/pull/907#issuecomment-122376238
Hi, I am reviewing this changes. I'm not done yet but I found some points
which are able to improve.
First, there are some duplicated classes such as `SimpleFlatJoinFunction`,
`MatchRemovingMatcher`, `Match`, `CollectionIterator`. I think that we can this
classes move under `org.apache.flink.runtime.operators.testutils` package.
After moving them, they can be shared with test cases for hash-based outer join.
Second, this is just my opinion, how about creating iterator classes for
each outer join type such as, `AbstractMergeLeftOuterJoinIterator`,
`AbstractMergeRightOuterJoinIterator`, `AbstractMergeFullOuterJoinIterator` and
derived classes by reusing variable? I'm concerned about time consuming by
comparing outer join type for many records in `callWithNextKey` method. The
outer join type is already decided before doing join operation. But I'm not
sure that there is obvious performance decrease by this comparing. If the
performance decrease is negligible, the second suggestion could be ignored.
> Implement Sort-Merge Outer Join algorithm
> -----------------------------------------
>
> Key: FLINK-2105
> URL: https://issues.apache.org/jira/browse/FLINK-2105
> Project: Flink
> Issue Type: Sub-task
> Components: Local Runtime
> Reporter: Fabian Hueske
> Assignee: Ricky Pogalz
> Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a sort-merge outer join algorithm that can
> cover left, right, and full outer joins.
> The implementation can be based on the regular sort-merge join iterator
> ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also
> {{MatchDriver}} class)
> The Reusing and NonReusing variants differ in whether object instances are
> reused or new objects are created. I would start with the NonReusing variant
> which is safer from a user's point of view and should also be easier to
> implement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)