[
https://issues.apache.org/jira/browse/BEAM-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099290#comment-16099290
]
ASF GitHub Bot commented on BEAM-1820:
--------------------------------------
GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/3634
[BEAM-1820] Source.getDefaultOutputCoder() throws
CannotProvideCoderException
This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but
it turned out that a lot more needs to be done to fix it properly, and the
necessary changes require knowledge of sufficiently dark corners of the SDK and
runners that asking an external contributor to do this is unfair, so I did the
changes myself.
TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to
succeed.
Source.getDefaultOutputCoder is only a hint for inferring the Coder of its
elements, and it should not be required to succeed.
E.g. when a runner is replacing a Read.from(source) transform, and the
override needs to know a coder for elements of the source, if the source
doesn't provide a coder, the user may have set a coder on the returned
PCollection explicitly. In this case, the runner should use that coder.
In other cases, when an API uses a Source and needs its coder, it must let
the caller provide a Coder explicitly (e.g. SourceTestUtils).
I am treating this as a backwards-compatible change because
Source.getDefaultOutputCoder() is a runner-facing rather than user-facing API.
From the point of view of a user implementing a Source, adding a throws to the
base class method signature is compatible.
R: @tgroh
CC: @lgajowy
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam source-coder-fallback
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3634.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3634
----
commit 830c5657736ffe49e06e16039f947e7227fa45a7
Author: Eugene Kirpichov <[email protected]>
Date: 2017-07-24T23:38:27Z
[BEAM-1820] Source.getDefaultOutputCoder() throws
CannotProvideCoderException
This is based on https://github.com/apache/beam/pull/3549 by @lgajowy,
but it turned out that a lot more needs to be done to fix it properly,
and the necessary changes require knowledge of sufficiently dark
corners of the SDK and runners that asking an external contributor
to do this is unfair, so I did the changes myself.
TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to
succeed.
Source.getDefaultOutputCoder is only a hint for inferring the Coder
of its elements, and it should not be required to succeed.
E.g. when a runner is replacing a Read.from(source) transform,
and the override needs to know a coder for elements of the source,
if the source doesn't provide a coder, the user may have set
a coder on the returned PCollection explicitly. In this case,
the runner should use that coder.
In other cases, when an API uses a Source and needs its coder,
it must let the caller provide a Coder explicitly (e.g.
SourceTestUtils).
----
> Source.getDefaultOutputCoder() should be @Nullable
> --------------------------------------------------
>
> Key: BEAM-1820
> URL: https://issues.apache.org/jira/browse/BEAM-1820
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Ćukasz Gajowy
> Labels: easyfix, starter
>
> Source.getDefaultOutputCoder() returns a coder for elements produced by the
> source.
> However, the Source objects are nearly always hidden from the user and
> instead encapsulated in a transform. Often, an enclosing transform has a
> better idea of what coder should be used to encode these elements (e.g. a
> user supplied a Coder to that transform's configuration). In that case, it'd
> be good if Source.getDefaultOutputCoder() could just return null, and coder
> would have to be handled by the enclosing transform or perhaps specified on
> the output of that transform explicitly.
> Right now there's a bunch of code in the SDK and runners that assumes
> Source.getDefaultOutputCoder() returns non-null. That code would need to be
> fixed to instead use the coder set on the collection produced by
> Read.from(source).
> It all appears pretty easy to fix, so this is a good starter item.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)