[ 
https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473424#comment-16473424
 ] 

Robin Trietsch commented on BEAM-4114:
--------------------------------------

I fixed running the tests. And I get why null values are not allowed now. 
During the tests, when I changed the behaviour of using null values as 
leftNullValue and rightNullValue, I saw:
{code:java}
[ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.611 s 
<<< FAILURE! - in org.apache.beam.sdk.extensions.joinlibrary.OuterFullJoinTest
[ERROR] 
testJoinNoneToNoneMapping(org.apache.beam.sdk.extensions.joinlibrary.OuterFullJoinTest)
 Time elapsed: 0.071 s <<< ERROR!
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: cannot 
encode a null String
{code}
In the coder module, it is not allowed to use null values. If we would like to 
allow null as null value in the join library, then quite a significant amount 
of work needs to be done in the coder module, which might break other things as 
well. Therefore, I propose not to change this, even though it is not really 
developer friendly this way. [~kenn], maybe you have other ideas?

> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> ------------------------------------------------------------------
>
>                 Key: BEAM-4114
>                 URL: https://issues.apache.org/jira/browse/BEAM-4114
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-join-library
>    Affects Versions: 2.4.0
>            Reporter: Robin Trietsch
>            Assignee: Robin Trietsch
>            Priority: Major
>
> When using the 
> [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-],
>  a checkNotNull() is done for the 
> [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207]
>  and 
> [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the 
> key used for the join is not the same, you'd like to see that the value will 
> become null. This should be decided by the developer, and not by the join 
> library.
> Looking at the source code, this is also supported by 
> [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42]
>  (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to