Thank you for the suggestion. I am able to identify the issue. The delay is caused by the combination of my sink and the managed memory allocation.
On Tue, Jun 24, 2025 at 4:52 AM Xuyang <xyzhong...@163.com> wrote: > Hi, Siva. > > Additionally, you can temporarily set the Flink configuration > `pipeline.operator-chaining: false` to unchain all operators. This will > allow you to see if one specific operator is particularly busy, which could > cause backpressure for all of its inputs. > > > > > -- > > Best! > Xuyang > > > > > > At 2025-06-24 11:00:49, "Matt Cuento" <cuentom...@gmail.com> wrote: > >Hi Siva, > > > >Unfortunately the picture attached does not render for me. Would you be > >able to send the output of what an `EXPLAIN` statement reveals as your > >logical plan? This is a good first step to getting an idea of what each > >operator is doing. > > > >Degraded performance over time/after processing many records sounds like > it > >could be a state size issue. Knowing what operators are used may help us > >know how much state is being stored. There are metrics around state size > >[1] as well as system resources [2]. > > > >Matt Cuento > >cuentom...@gmail.com > > > >[1] > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-resources > >[2] > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#state-size > > > > > >On Mon, Jun 23, 2025 at 4:25 PM Siva Ram Gujju <sivaram2...@gmail.com> > >wrote: > > > >> Hello, > >> I'm new to Flink and doing a POC. I have a Flink job which reads events > >> from a kafka source Topic, performs some calculations and outputs a > couple > >> of SQL sinks. > >> I deployed this to a stand alone cluster running on my linux virtual > >> machine (all default settings). > >> > >> Parallelism=3 > >> NoOfTaskSlots allowed in config.yml=10 > >> NoOfTaskSlots required for my job=3 > >> Rest of the settings are default. > >> > >> The job runs fine for the first 100,000 event and the response is near > >> real time. After that the first operator of the job starts to show Busy > >> (max): 100% and the processing slows down significantly (see below > picture). > >> Heap is at 50%. > >> Source Lag (kafka consumers lag) is 0. Source Kafka cluster CPU is <3%. > >> > >> > >> 1. How can I triage what is causing slowness? Is it a CPU or Memory > issue, > >> how do I find it? Everything looks normal to me. No exceptions in logs. > >> 2. Why did the job run fine for the 100K event super fast and started > >> slowing down? Any theory on this? > >> Please suggest. Thank you! > >> > >> > >> [image: Picture 1, Picture] > >> >