Github user chiwanpark commented on the pull request:
https://github.com/apache/flink/pull/907#issuecomment-122376238
Hi, I am reviewing this changes. I'm not done yet but I found some points
which are able to improve.
First, there are some duplicated classes such as `SimpleFlatJoinFunction`,
`MatchRemovingMatcher`, `Match`, `CollectionIterator`. I think that we can this
classes move under `org.apache.flink.runtime.operators.testutils` package.
After moving them, they can be shared with test cases for hash-based outer join.
Second, this is just my opinion, how about creating iterator classes for
each outer join type such as, `AbstractMergeLeftOuterJoinIterator`,
`AbstractMergeRightOuterJoinIterator`, `AbstractMergeFullOuterJoinIterator` and
derived classes by reusing variable? I'm concerned about time consuming by
comparing outer join type for many records in `callWithNextKey` method. The
outer join type is already decided before doing join operation. But I'm not
sure that there is obvious performance decrease by this comparing. If the
performance decrease is negligible, the second suggestion could be ignored.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---