lostluck commented on issue #29180: URL: https://github.com/apache/beam/issues/29180#issuecomment-1884012094
I can trigger the "aggressive splitting" if I add a `time.Sleep(time.Millisecond)` to the pipeline, and I can avoid the aggressive splitting if I account for any PCollection ElementCount increasing during the progress check interval. That is, if the Bundle isn't producing *any* output at all, then split, otherwise don't touch it. While this is still a basic heuristic, having the TotalCount of elements emitted from the last progress available will let us make it more sophisticated in the future, based on the ratios of dataChannel inputs consumed since last time and total outputs and so on, so it's not only splitting when a bundle appears to be stuck. (eg. if we produced less than half output per input since the last progress tick.) Anyway... #29968 might solve your issue, if you're able to patch it in and test it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
