Adding a few folks who might help +Chamikara Jayalath <[email protected]> +Lukasz Cwik <[email protected]> +Maximilian Michels <[email protected]>
On Sat, Aug 17, 2019 at 11:27 AM Chad Dombrova <[email protected]> wrote: > Hi all, > I've got a PR[1] <https://github.com/apache/beam/pull/9268> for adding > external transform support to PubsubIO so that it will work with python and > go pipelines on Flink, and I am *so* close, but I've run into questions > that the code cannot answer: I need a human now. > > The brief summary is that everything is working as expected right up to > the point where the Flink portable translator is prepping the pipeline for > submission to Flink. In this process it is replacing > the PubsubMessageWithAttributesCoder with a ByteArrayCoder, in an effort to > make it a wire-compatible. Naturally, the ByteArrayCoder fails to encode > the PubsubMessage that's passed to it. > > I don't understand why this replacement is necessary, since the next > transform in the chain is a java ParDo that seems like it should be fully > capable of using PubsubMessageWithAttributesCoder. > > The pipeline looks like this: > > --- java --- > Read<PBegin, PubsubMessage> > ParDo<PubsubMessage, PubsubMessage> > ParDo<PubsubMessage, byte[]> > --- python -- > ParDo<byte[], PubsubMessage> > > I'm really not sure what the right solution is. My next course of action > is going to be to compare this to the KafkaIO behavior, since this is the > only other external Read that's supported. > > Any help is greatly appreciated! > > -chad > > [1] https://github.com/apache/beam/pull/9268 > >
