Re: Re: Help - Question on triaging slowness

Siva Ram Gujju Tue, 24 Jun 2025 17:04:41 -0700

Thank you for the suggestion. I am able to identify the issue. The delay is
caused by the combination of my sink and the managed memory allocation.



On Tue, Jun 24, 2025 at 4:52 AM Xuyang <xyzhong...@163.com> wrote:

> Hi, Siva.
>
> Additionally, you can temporarily set the Flink configuration
> `pipeline.operator-chaining: false` to unchain all operators. This will
> allow you to see if one specific operator is particularly busy, which could
> cause backpressure for all of its inputs.
>
>
>
>
> --
>
>     Best！
>     Xuyang
>
>
>
>
>
> At 2025-06-24 11:00:49, "Matt Cuento" <cuentom...@gmail.com> wrote:
> >Hi Siva,
> >
> >Unfortunately the picture attached does not render for me. Would you be
> >able to send the output of what an `EXPLAIN` statement reveals as your
> >logical plan? This is a good first step to getting an idea of what each
> >operator is doing.
> >
> >Degraded performance over time/after processing many records sounds like
> it
> >could be a state size issue. Knowing what operators are used may help us
> >know how much state is being stored. There are metrics around state size
> >[1] as well as system resources [2].
> >
> >Matt Cuento
> >cuentom...@gmail.com
> >
> >[1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-resources
> >[2]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#state-size
> >
> >
> >On Mon, Jun 23, 2025 at 4:25 PM Siva Ram Gujju <sivaram2...@gmail.com>
> >wrote:
> >
> >> Hello,
> >> I'm new to Flink and doing a POC. I have a Flink job which reads events
> >> from a kafka source Topic, performs some calculations and outputs a
> couple
> >> of SQL sinks.
> >> I deployed this to a stand alone cluster running on my linux virtual
> >> machine (all default settings).
> >>
> >> Parallelism=3
> >> NoOfTaskSlots allowed in config.yml=10
> >> NoOfTaskSlots required for my job=3
> >> Rest of the settings are default.
> >>
> >> The job runs fine for the first 100,000 event and the response is near
> >> real time. After that the first operator of the job starts to show Busy
> >> (max): 100% and the processing slows down significantly (see below
> picture).
> >> Heap is at 50%.
> >> Source Lag (kafka consumers lag) is 0. Source Kafka cluster CPU is <3%.
> >>
> >>
> >> 1. How can I triage what is causing slowness? Is it a CPU or Memory
> issue,
> >> how do I find it? Everything looks normal to me. No exceptions in logs.
> >> 2. Why did the job run fine for the 100K event super fast and started
> >> slowing down? Any theory on this?
> >> Please suggest. Thank you!
> >>
> >>
> >> [image: Picture 1, Picture]
> >>
>

Re: Re: Help - Question on triaging slowness

Reply via email to