Ah, this is a point that Robert brings up quite often: one reason we put
coders on PCollections instead of doing that work in PTransforms is that
the runner (plus SDK harness) can automatically only serialize when
necessary. So the default in Beam is that the thing you want to happen is
already done. There are some corner cases when you get to the portability
framework but I am pretty sure it already works this way. If you show what
is a PTransform and PCollection in your example it might show where we can
fix things.

On Tue, Jan 30, 2018 at 12:17 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Indeed,
>
> I'll take a stupid example to make it shorter.
> I have a source emitting Person objects ({name:...,id:...}) serialized
> with jackson as JSON.
> Then my pipeline processes them with a DoFn taking a Map<String, String>.
> Here I set the coder to read json as a map.
>
> However a Map<String, String> is not a Person so my pipeline needs an
> intermediate step to convert one into the other and has in the design an
> useless serialization round trip.
>
> If you check the chain you have: Person -> JSON -> Map<String, String> ->
> JSON -> Map<String, String> whereas Person -> JSON -> Map<String, String>
> is fully enough cause there is equivalence of JSON in this example.
>
> In other words if an coder output is readable from another coder input,
> the java strong typing doesn't know about it and can enforce some fake
> steps.
>
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau>
>
> 2018-01-30 21:07 GMT+01:00 Kenneth Knowles <k...@google.com>:
>
>> I'm not sure I understand your question. Can you explain more?
>>
>> On Tue, Jan 30, 2018 at 11:50 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> just encountered an issue with the pipeline API and wondered if you
>>> thought about it.
>>>
>>> It can happen the Coders are compatible between them. Simple example is
>>> a text coder like JSON or XML will be able to read text. However with the
>>> pipeline API you can't support this directly and
>>> enforce the user to use an intermediate state to be typed.
>>>
>>> Is there already a way to avoid these useless round trips?
>>>
>>> Said otherwise: how to handle coders transitivity?
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau>
>>>
>>
>>
>

Reply via email to