[
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648463#comment-14648463
]
ASF GitHub Bot commented on FLINK-2105:
---------------------------------------
Github user r-pogalz commented on the pull request:
https://github.com/apache/flink/pull/907#issuecomment-126517321
@fhueske, I ran through the classes and tried to catch all occurrences of
`match` and replaced them by `join`. Hopefully I did not miss something :)
Moreover, I set up a benchmark with JMH and did several runs on my machine
(MacBook Air, 1.3 GHz Intel Core i5, 4 GB 1600 MHz DDR3). I pushed the code on
another branch, so you can check it for further information (commit can be seen
[here](https://github.com/r-pogalz/flink/commit/629c2ff5caa9775d968deecdc2ff0b965310a09e)).
I decided to test a dedicated class for the Left Outer Join against the left
outer join using the existing class for all types. An iteration in the
benchmark measures the time needed to perform the outer join for the entire
inputs. For the first couple of runs I chose a fixed cardinality of 10k entries
for the right input. I varied the input size of the left input (20k, 50k and
100k).
Cardinality of left input | Same class (avg. time in seconds) | Dedicated
class (avg. time in seconds)
------------ | ------------- | ------------
20k | **4.264** | 4.282
50k | **7.845** | 7.913
100k | 14.090 | **13.907**
Moreover, I did a test run with **50k entries on both inputs** and got the
following results:
Same class (avg. time in seconds) | Dedicated class (avg. time in seconds)
-------------- | ---------------
**10.251** | 10.571
The results show that both implementations perform almost similar. So I
guess the JIT compiler does proper optimizations concerning the if-statements
in our current implementation. Or what do you think @fhueske and @chiwanpark ?
I also pushed the results. Please look
[here](https://github.com/r-pogalz/flink/commit/32dfabe33efc382ee2bb1c879eba507318cdbc24)
for detailed information (e.g. min, max, std. deviation).
> Implement Sort-Merge Outer Join algorithm
> -----------------------------------------
>
> Key: FLINK-2105
> URL: https://issues.apache.org/jira/browse/FLINK-2105
> Project: Flink
> Issue Type: Sub-task
> Components: Local Runtime
> Reporter: Fabian Hueske
> Assignee: Ricky Pogalz
> Priority: Minor
> Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment.
> This issue proposes to implement a sort-merge outer join algorithm that can
> cover left, right, and full outer joins.
> The implementation can be based on the regular sort-merge join iterator
> ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also
> {{MatchDriver}} class)
> The Reusing and NonReusing variants differ in whether object instances are
> reused or new objects are created. I would start with the NonReusing variant
> which is safer from a user's point of view and should also be easier to
> implement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)