The Go SDK doesn't currently have the ability to self split bundles. This is being worked on as part of SplittableDoFns.
Unfortunately this means the only Source that works in streaming modes is the Google PubSubIO source which only works on Dataflow, since that runner replaces the node with an internal implementation (same as Java and Python). With proper windowing, the Group By Key behaves as expected if once ParDos can self checkpoint. Another approach that will be worked on in the summer is supporting Cross Language Transforms so that Go pipelines can re-use the Java IOs which would also resolve this problem. Thank you for understanding, as we work to get the Go SDK out of its experimental state. On Mon, Feb 3, 2020, 9:13 AM Anand Singh Kunwar <[email protected]> wrote: > Hi > I have been trying out the apache beam go SDK for running aggregation on > logs from kafka. I have been trying to write a kafka reader using ParDoFns > and aggregating the result using WindowInto and GroupByKey. However the > `GroupByKey`'s output only outputs result if I finish my kafka message > outputs. I am using the direct runner for testing these out. It seems like > a bug in the runner/GroupByKey implementation as it seems to assume that > the output is going to exhaust even in the case of windowed input. > > Pipeline representation: > impulse -> kafka consumption ParDo(consumes kafka messages in a loop and > emits metric with beam.EventTime) -> add timestamp ParDo -> > WindowInto(fixed windows) -> Add window max Timestamp as key Pardo -> > *GroupByKey* > > The *GroupByKey *gets stuck until first kafka consumption pardo exits. > This doesn't seem desired. I haven't been unable to pinpoint the exact line > of code why this gets stuck. Can someone help me out with this. > > Thanks for your help. > > Best > Anand Singh Kunwar >
