On Tue, Aug 20, 2019 at 4:29 AM Maximilian Michels <[email protected]> wrote:
> Hi Chad! > > Thank you so much for your feedback. You are 100% on the right track. > What you are seeing is a core issue that also needs to be solved for > KafkaIO to be fully usable in other SDKs. I haven't had much time to > work on this in the past weeks but now is the time :) > > The cross-language implementation recycles source connectors (like > KafkaIO.Read) which use the legacy source interface. This interface does > not exist anymore in portability. The current portability architecture > assumes UDFs (this includes KafkaIO.Read) to run in an environment, and > not directly on the Runner. This causes issues when there are multiple > transforms associated with a connector which run with and without an > environment, e.g. KafkaIO.Read or PubSubIO.Read. > The issue is also tracked here: > https://jira.apache.org/jira/browse/BEAM-7870 There are some suggestions > in the issue. I think the best solution is to allow execution of the > source API parts of KafkaIO/PubSubIO (on the Runner) and the following > UDFs (in the environment). Since those do not cross the language > boundary, it is simply a matter of settings this up correctly. > I think another way to fix this will be to introduce an UnboundedSource to SDF converter when we have SDF for unbounded source for Java SDK. I think +Lukasz Cwik <[email protected]> already added something like this for Python SDK. This will allow KafkaIO.Read to execute in the Java SDK environment instead of the runner. Thanks, Cham > > -Max > > On 20.08.19 03:45, Chad Dombrova wrote: > > > > I don't understand why this replacement is necessary, since > > the next transform in the chain is a java ParDo that seems > > like it should be fully capable of using > > PubsubMessageWithAttributesCoder. > > > > > > Not too familiar with Flink, but have you tried using PubSub source > > from a pure Java Flink pipeline ? I expect this to work but haven't > > tested it. If that works agree that the coder replacement sounds > > strange. > > > > > > Just spoke with my co-worker and he confirmed that Pubsub Read works in > > pure Java on Flink. We're going to test sending the same Java pipeline > > through the portable runner next. Our hypothesis is that it will /not/ > > work. If it /does/, then that means it's something specific to how the > > transforms are created when using the expansion service. > > > > -chad > > > > >
