[GitHub] flink pull request: [FLINK-2576] Add Outer Join operator to Optimi...

jkovacs Fri, 18 Sep 2015 14:11:57 -0700

Github user jkovacs commented on the pull request:

    https://github.com/apache/flink/pull/1138#issuecomment-141569673
  
    To partly answer my own question: One big drawback of downgrading the tuple 
field types to `GenericTypeInfo` is that for (de)serialization and comparison 
the generic Kryo serializers will be used, which are significantly slower than 
the native flink serializers and comparators for basic types, such as Integer 
(according to [this blog 
post](http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html)).
    
    One obvious way to work around this is to only downgrade the fields that 
are actually nullable, and keep the original types of the definitely non-null 
fields (i.e. the types from the outer side of a left or right outer join). This 
way the user can still group/join/sort efficiently on the non-null fields, 
while preserving null safety for the other fields.
    
    I pushed another commit for this to my temporary branch for review, if this 
makes sense: 
https://github.com/jkovacs/flink/compare/feature/FLINK-2576...jkovacs:feature/FLINK-2576-projection-types
    
    As you can see I was really hoping to make the projection joins work 
properly :-) but if you feel that the effort isn't worth it or I'm missing 
something else entirely, we can for sure simply scrap that and throw an 
`InvalidProgramException` when the user tries to do a project outer join 
instead of defining his own join udf. Opinions on that are welcome.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2576] Add Outer Join operator to Optimi...

Reply via email to