[
https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473424#comment-16473424
]
Robin Trietsch commented on BEAM-4114:
--------------------------------------
I fixed running the tests. And I get why null values are not allowed now.
During the tests, when I changed the behaviour of using null values as
leftNullValue and rightNullValue, I saw:
{code:java}
[ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.611 s
<<< FAILURE! - in org.apache.beam.sdk.extensions.joinlibrary.OuterFullJoinTest
[ERROR]
testJoinNoneToNoneMapping(org.apache.beam.sdk.extensions.joinlibrary.OuterFullJoinTest)
Time elapsed: 0.071 s <<< ERROR!
org.apache.beam.sdk.Pipeline$PipelineExecutionException:
java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: cannot
encode a null String
{code}
In the coder module, it is not allowed to use null values. If we would like to
allow null as null value in the join library, then quite a significant amount
of work needs to be done in the coder module, which might break other things as
well. Therefore, I propose not to change this, even though it is not really
developer friendly this way. [~kenn], maybe you have other ideas?
> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> ------------------------------------------------------------------
>
> Key: BEAM-4114
> URL: https://issues.apache.org/jira/browse/BEAM-4114
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-join-library
> Affects Versions: 2.4.0
> Reporter: Robin Trietsch
> Assignee: Robin Trietsch
> Priority: Major
>
> When using the
> [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-],
> a checkNotNull() is done for the
> [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207]
> and
> [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the
> key used for the join is not the same, you'd like to see that the value will
> become null. This should be decided by the developer, and not by the join
> library.
> Looking at the source code, this is also supported by
> [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42]
> (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)