Github user jkovacs commented on the pull request:
https://github.com/apache/flink/pull/1138#issuecomment-141569673
To partly answer my own question: One big drawback of downgrading the tuple
field types to `GenericTypeInfo` is that for (de)serialization and comparison
the generic Kryo serializers will be used, which are significantly slower than
the native flink serializers and comparators for basic types, such as Integer
(according to [this blog
post](http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html)).
One obvious way to work around this is to only downgrade the fields that
are actually nullable, and keep the original types of the definitely non-null
fields (i.e. the types from the outer side of a left or right outer join). This
way the user can still group/join/sort efficiently on the non-null fields,
while preserving null safety for the other fields.
I pushed another commit for this to my temporary branch for review, if this
makes sense:
https://github.com/jkovacs/flink/compare/feature/FLINK-2576...jkovacs:feature/FLINK-2576-projection-types
As you can see I was really hoping to make the projection joins work
properly :-) but if you feel that the effort isn't worth it or I'm missing
something else entirely, we can for sure simply scrap that and throw an
`InvalidProgramException` when the user tries to do a project outer join
instead of defining his own join udf. Opinions on that are welcome.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---