yelianevich commented on issue #31085: URL: https://github.com/apache/beam/issues/31085#issuecomment-2115304242
Hi, I'm facing a similar issue with Beam 2.56.0 and Flink 1.16.3 and Java SDK. The problematic pipeline has parallelism 4 and has many *slowly updating global window side inputs* (uses unbounded `GenerateSequence` as in [patterns](https://beam.apache.org/documentation/patterns/side-inputs/)) that are being updated every hour. A watermark is not being emitted untill all source subtasks emit a message. This is a major issue, since it's not feasible to generate impulses more frequently and the pipeline is not able to make any progress. Another source is Kafka and input topics have 3 partitions. I can see that one of the subtasks become FINISHED after some time and only 3 subtasks are active. Some of them receive very few messages and it's a norm for some partitions not to receive data for some time, this is especially true for testing environment. I've tryied to set `--autoWatermarkInterval=100`, but it did not have any effect. There are no other Beam-specific properties set. Any ideas how to fix this? Should I downgrade Beam version to workaround the problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
