Hi all,
I've got a PR[1] <https://github.com/apache/beam/pull/9268> for adding
external transform support to PubsubIO so that it will work with python and
go pipelines on Flink, and I am *so* close, but I've run into questions
that the code cannot answer:  I need a human now.

The brief summary is that everything is working as expected right up to the
point where the Flink portable translator is prepping the pipeline for
submission to Flink.  In this process it is replacing
the PubsubMessageWithAttributesCoder with a ByteArrayCoder, in an effort to
make it a wire-compatible.  Naturally, the ByteArrayCoder fails to encode
the PubsubMessage that's passed to it.

I don't understand why this replacement is necessary, since the next
transform in the chain is a java ParDo that seems like it should be fully
capable of using PubsubMessageWithAttributesCoder.

The pipeline looks like this:

--- java ---
Read<PBegin, PubsubMessage>
ParDo<PubsubMessage, PubsubMessage>
ParDo<PubsubMessage, byte[]>
--- python --
ParDo<byte[], PubsubMessage>

I'm really not sure what the right solution is.  My next course of action
is going to be to compare this to the KafkaIO behavior, since this is the
only other external Read that's supported.

Any help is greatly appreciated!

-chad

[1] https://github.com/apache/beam/pull/9268

Reply via email to