[
https://issues.apache.org/jira/browse/FLINK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876425#comment-14876425
]
ASF GitHub Bot commented on FLINK-2576:
---------------------------------------
Github user jkovacs commented on the pull request:
https://github.com/apache/flink/pull/1138#issuecomment-141569673
To partly answer my own question: One big drawback of downgrading the tuple
field types to `GenericTypeInfo` is that for (de)serialization and comparison
the generic Kryo serializers will be used, which are significantly slower than
the native flink serializers and comparators for basic types, such as Integer
(according to [this blog
post](http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html)).
One obvious way to work around this is to only downgrade the fields that
are actually nullable, and keep the original types of the definitely non-null
fields (i.e. the types from the outer side of a left or right outer join). This
way the user can still group/join/sort efficiently on the non-null fields,
while preserving null safety for the other fields.
I pushed another commit for this to my temporary branch for review, if this
makes sense:
https://github.com/jkovacs/flink/compare/feature/FLINK-2576...jkovacs:feature/FLINK-2576-projection-types
As you can see I was really hoping to make the projection joins work
properly :-) but if you feel that the effort isn't worth it or I'm missing
something else entirely, we can for sure simply scrap that and throw an
`InvalidProgramException` when the user tries to do a project outer join
instead of defining his own join udf. Opinions on that are welcome.
> Add outer joins to API and Optimizer
> ------------------------------------
>
> Key: FLINK-2576
> URL: https://issues.apache.org/jira/browse/FLINK-2576
> Project: Flink
> Issue Type: Sub-task
> Components: Java API, Optimizer, Scala API
> Reporter: Ricky Pogalz
> Priority: Minor
> Fix For: pre-apache
>
>
> Add left/right/full outer join methods to the DataSet APIs (Java, Scala) and
> to the optimizer of Flink.
> Initially, the execution strategy should be a sort-merge outer join
> (FLINK-2105) but can later be extended to hash joins for left/right outer
> joins.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)