It uses aligned checkpoint by default in Flink which needs to process
all the data buffered in the pipeline(network and operators) during
checkpointing. In your use case, as the process speed is very slow and
so it may take too long to process the buffered data. You could try to
enable unalign checkpoint (via configuration
`execution.checkpointing.unaligned.enabled` which is false by default)
and lower the PyFlink bundle size (via configuration
python.fn-execution.bundle.size which is 1000 by default).


On Wed, May 14, 2025 at 1:53 PM Hirson Zhang <milesian...@163.com> wrote:
>
> Hello,
>
> I tried the configuration you mentioned, but it doesn't seem to work. Still, 
> thank you for your response!
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2025-05-13 17:54:03, "Sharath" <dsaishar...@gmail.com> wrote:
> >Hello,
> >
> >Have you tried enabling the buffer debloating feature to improve checkpoint
> >times? Refer taskmanager.network.memory.buffer-debloat.enabled in
> >https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
> >
> >Regards,
> >Sharath
> >
> >On Tue, May 13, 2025 at 1:59 AM 张河川 <milesian...@163.com> wrote:
> >
> >> Hi Flink community,
> >>
> >> I’m encountering an issue with PyFlink where a FlatMap operator invokes an
> >> external service (using a PyTorch model to generate embedding vectors). The
> >> operator processes data very slowly, leading to an extremely long initial
> >> checkpoint start delay, which eventually causes checkpoint failures.The
> >> external service has strict concurrency limits and cannot handle increased
> >> parallel requests,increasing the parallelism of the operator did not
> >> improve performance due to this bottleneck.
> >>
> >> Besides, when I use flink1.20.0, the operator processing speed seems to be
> >> faster than that of flink2.0.0.
> >>
> >> Does anyone have any clue?
> >>
> >> Thank you for your insights!

Reply via email to