Hi everyone, We're seeing a dramatic regression in the Python SDK with Flink + Beam 2.18.0 after upgrading from 2.16.0. Note that the Flink version is unchanged. After bisecting the problem, we found that https://jira.apache.org/jira/browse/BEAM-8944 is the cause for this.
The dynamic thread creation adds a significant delay to the bundle completion which causes checkpoint times to increase 2-3x. I've tried increasing the thread life time but that did not change anything. Reverting to the static thread pool gives us 2.16.0 performance. Looking at the code, I don't see something obviously wrong, but the new locks and the use of threading.Event/threading.Condition could be too much for the Python interpreter. -Max
