[ 
https://issues.apache.org/jira/browse/FLINK-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648463#comment-14648463
 ] 

ASF GitHub Bot commented on FLINK-2105:
---------------------------------------

Github user r-pogalz commented on the pull request:

    https://github.com/apache/flink/pull/907#issuecomment-126517321
  
    @fhueske, I ran through the classes and tried to catch all occurrences of 
`match` and replaced them by `join`. Hopefully I did not miss something :)
    
    Moreover, I set up a benchmark with JMH and did several runs on my machine 
(MacBook Air, 1.3 GHz Intel Core i5, 4 GB 1600 MHz DDR3). I pushed the code on 
another branch, so you can check it for further information (commit can be seen 
[here](https://github.com/r-pogalz/flink/commit/629c2ff5caa9775d968deecdc2ff0b965310a09e)).
 I decided to test a dedicated class for the Left Outer Join against the left 
outer join using the existing class for all types. An iteration in the 
benchmark measures the time needed to perform the outer join for the entire 
inputs. For the first couple of runs I chose a fixed cardinality of 10k entries 
for the right input. I varied the input size of the left input (20k, 50k and 
100k).
    
     Cardinality of left input | Same class (avg. time in seconds) | Dedicated 
class (avg. time in seconds)
    ------------ | ------------- | ------------
    20k | **4.264** | 4.282
    50k | **7.845** | 7.913
    100k | 14.090 | **13.907**
    
    Moreover, I did a test run with **50k entries on both inputs** and got the 
following results:
    
    Same class (avg. time in seconds) | Dedicated class (avg. time in seconds)
    -------------- | ---------------
    **10.251** | 10.571
    
    The results show that both implementations perform almost similar. So I 
guess the JIT compiler does proper optimizations concerning the if-statements 
in our current implementation. Or what do you think @fhueske and @chiwanpark ?
    
    I also pushed the results. Please look 
[here](https://github.com/r-pogalz/flink/commit/32dfabe33efc382ee2bb1c879eba507318cdbc24)
 for detailed information (e.g. min, max, std. deviation).


> Implement Sort-Merge Outer Join algorithm
> -----------------------------------------
>
>                 Key: FLINK-2105
>                 URL: https://issues.apache.org/jira/browse/FLINK-2105
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Fabian Hueske
>            Assignee: Ricky Pogalz
>            Priority: Minor
>             Fix For: pre-apache
>
>
> Flink does not natively support outer joins at the moment. 
> This issue proposes to implement a sort-merge outer join algorithm that can 
> cover left, right, and full outer joins.
> The implementation can be based on the regular sort-merge join iterator 
> ({{ReusingMergeMatchIterator}} and {{NonReusingMergeMatchIterator}}, see also 
> {{MatchDriver}} class)
> The Reusing and NonReusing variants differ in whether object instances are 
> reused or new objects are created. I would start with the NonReusing variant 
> which is safer from a user's point of view and should also be easier to 
> implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to