Hi all, I've got a PR[1] <https://github.com/apache/beam/pull/9268> for adding external transform support to PubsubIO so that it will work with python and go pipelines on Flink, and I am *so* close, but I've run into questions that the code cannot answer: I need a human now.
The brief summary is that everything is working as expected right up to the point where the Flink portable translator is prepping the pipeline for submission to Flink. In this process it is replacing the PubsubMessageWithAttributesCoder with a ByteArrayCoder, in an effort to make it a wire-compatible. Naturally, the ByteArrayCoder fails to encode the PubsubMessage that's passed to it. I don't understand why this replacement is necessary, since the next transform in the chain is a java ParDo that seems like it should be fully capable of using PubsubMessageWithAttributesCoder. The pipeline looks like this: --- java --- Read<PBegin, PubsubMessage> ParDo<PubsubMessage, PubsubMessage> ParDo<PubsubMessage, byte[]> --- python -- ParDo<byte[], PubsubMessage> I'm really not sure what the right solution is. My next course of action is going to be to compare this to the KafkaIO behavior, since this is the only other external Read that's supported. Any help is greatly appreciated! -chad [1] https://github.com/apache/beam/pull/9268