GitHub user jkovacs opened a pull request:
https://github.com/apache/flink/pull/1138
Feature/flink 2576
This PR implements
[FLINK-2576](https://issues.apache.org/jira/browse/FLINK-2576) (Adding the
outer join operator to the optimizer and Java/Scala APIs, previously part of
[FLINK-2106](https://issues.apache.org/jira/browse/FLINK-2106)).
For reference, the revious pull requests for the outer join implementation
were #907 and #1052.
First of all thanks for the help we received in person and on the mailing
list.
I designed the API as per the consensus on the mailing list and tried
reusing as much code from the join operator api as possible.
This PR contributes the following:
* An OuterJoinNode to the optimizer, and 3 Sort Merge OuterJoinDescriptors
for each type of outer join
* One outer join base operator
* left/righ/fullOuterJoin() methods to the Java and Scala APIs
* Including some updates to the join javadocs in Java/Scala APIs
* Refactorings where necessary (mostly concerned with being able to reuse
inner join operator code)
* Specifically refactoring of the JoinOperator in the Java API:
* Added JoinType property, identifying inner/left-/right-/full outer join
* Removed PlanXUnwrappingJoinOperator classes, instead promoting the
TupleXUnwrappingJoiners to be able to reuse the existing unwrapping logic
* Added inner class JoinOperatorBaseBuilder to be able to transparently
construct a base operator for all types of joins, as well as tuple unwrapping
of left and right inputs
* Make sure the user can't compile a default join plan for outer joins,
as well as make projection joins work with outer joins (see below)
* End to end integration tests for the outer join operator using the Java
and Scala APIs in flink-tests
Usage & Implementation:
In both APIs we prohibit using the default join functionality for outer
joins. The user is required
to specify a custom join function that combines the (potentially `null`)
left and right side tuples.
In the Java API we support the projection join functionality for outer
joins. (Projection joins are not yet implemented in the Scala API for inner
joins, therefore no changes there.)
Important to note is that when the user performs a projection join, the
type information is lost.
This is also the case for the inner projection join. Additionally, we
explicitly "downgrade" the result type information of an outer projection join
to a Tuple of `GenericTypeInfo<>(Object.class)`, in order to be able to
serialize `null` values.
A nicer way to do this would be to use an `Optional<T>` type to represent
nullable tuple values, but because we can't rely on Java 8 types, nor did I
want to hardcode a dependency to a 3rd party `Optional` type (e.g. from guava)
into the API, we went this route, for now.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkovacs/flink feature/FLINK-2576
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1138.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1138
----
commit e3ea010462e0290b857c296b0ff9572332827421
Author: Johann Kovacs <[email protected]>
Date: 2015-09-08T16:23:54Z
[FLINK-2576] [refactor] Extract abstract superclass for join operators
commit 061e61027a070fa408c6e9a072d5a755a5dbcc0e
Author: Johann Kovacs <[email protected]>
Date: 2015-08-25T12:16:02Z
[FLINK-2576] [refactor] Extract common optimizer code to superclass
commit 1465aa1d38e1730cf400e1d3164400efd72dd420
Author: r-pogalz <[email protected]>
Date: 2015-07-07T19:40:04Z
[FLINK-2576] Add outer join base operator
commit d5ae5d74a7283512440cffda4e1675760a9d335e
Author: Johann Kovacs <[email protected]>
Date: 2015-09-09T09:02:08Z
[FLINK-2576] Add outer join to optimizer
commit 0a89a0dbbe8a8b8bf6c38382e02d072d169421cf
Author: Johann Kovacs <[email protected]>
Date: 2015-09-10T15:24:29Z
[FLINK-2576] [tests] Don't swallow exceptions during program compilation
and optimization
commit 1ccca5ba74ea9da82c14ab350582ab62dbf540a3
Author: Johann Kovacs <[email protected]>
Date: 2015-09-16T15:00:43Z
[FLINK-2576] Add outer join operator to Java DataSet API
commit b66b1b0a42449bc1eedfd74adcea87cb52d2a09e
Author: Johann Kovacs <[email protected]>
Date: 2015-09-16T14:56:03Z
[FLINK-2576] Add outer join operator to Scala DataSet API
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---