Eugene Kirpichov created BEAM-2536:
--------------------------------------
Summary: Simplify specifying coders on PCollectionTuple
Key: BEAM-2536
URL: https://issues.apache.org/jira/browse/BEAM-2536
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Eugene Kirpichov
Currently when using a multi-output ParDo, the user usually has to do one of
the following:
1) Use anonymous class: new TupleTag<Foo>() {} - in order to reify the Foo type
and make coder inference work. In this case, a frequent problem is that the
anonymous class captures a large enclosing class, and either doesn't serialize
at all, or at least serializes to something bulky.
2) Explicitly do tuple.get(myTag).setCoder(...)
Both of these are suboptimal.
Could we have e.g. a constructor for TupleTag that explicitly takes a
TypeDescriptor? Or even a Coder? Or a family of factory methods for
TupleTagList that take these? E.g.:
in.apply(ParDo.of(...).withOutputTags(mainTag, TupleTagList.of(side1,
FooCoder.of()).and(side2, BarCoder.of()));
I would suggest both: TupleTag constructor should optionally take a
TypeDescriptor; and TupleTagList.of() and .and() should optionally take a Coder.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)