Hello, Have you tried enabling the buffer debloating feature to improve checkpoint times? Refer taskmanager.network.memory.buffer-debloat.enabled in https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
Regards, Sharath On Tue, May 13, 2025 at 1:59 AM 张河川 <milesian...@163.com> wrote: > Hi Flink community, > > I’m encountering an issue with PyFlink where a FlatMap operator invokes an > external service (using a PyTorch model to generate embedding vectors). The > operator processes data very slowly, leading to an extremely long initial > checkpoint start delay, which eventually causes checkpoint failures.The > external service has strict concurrency limits and cannot handle increased > parallel requests,increasing the parallelism of the operator did not > improve performance due to this bottleneck. > > Besides, when I use flink1.20.0, the operator processing speed seems to be > faster than that of flink2.0.0. > > Does anyone have any clue? > > Thank you for your insights!