[ 
https://issues.apache.org/jira/browse/FLINK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876425#comment-14876425
 ] 

ASF GitHub Bot commented on FLINK-2576:
---------------------------------------

Github user jkovacs commented on the pull request:

    https://github.com/apache/flink/pull/1138#issuecomment-141569673
  
    To partly answer my own question: One big drawback of downgrading the tuple 
field types to `GenericTypeInfo` is that for (de)serialization and comparison 
the generic Kryo serializers will be used, which are significantly slower than 
the native flink serializers and comparators for basic types, such as Integer 
(according to [this blog 
post](http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html)).
    
    One obvious way to work around this is to only downgrade the fields that 
are actually nullable, and keep the original types of the definitely non-null 
fields (i.e. the types from the outer side of a left or right outer join). This 
way the user can still group/join/sort efficiently on the non-null fields, 
while preserving null safety for the other fields.
    
    I pushed another commit for this to my temporary branch for review, if this 
makes sense: 
https://github.com/jkovacs/flink/compare/feature/FLINK-2576...jkovacs:feature/FLINK-2576-projection-types
    
    As you can see I was really hoping to make the projection joins work 
properly :-) but if you feel that the effort isn't worth it or I'm missing 
something else entirely, we can for sure simply scrap that and throw an 
`InvalidProgramException` when the user tries to do a project outer join 
instead of defining his own join udf. Opinions on that are welcome.


> Add outer joins to API and Optimizer
> ------------------------------------
>
>                 Key: FLINK-2576
>                 URL: https://issues.apache.org/jira/browse/FLINK-2576
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Java API, Optimizer, Scala API
>            Reporter: Ricky Pogalz
>            Priority: Minor
>             Fix For: pre-apache
>
>
> Add left/right/full outer join methods to the DataSet APIs (Java, Scala) and 
> to the optimizer of Flink.
> Initially, the execution strategy should be a sort-merge outer join 
> (FLINK-2105) but can later be extended to hash joins for left/right outer 
> joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to