Hi Siva, Unfortunately the picture attached does not render for me. Would you be able to send the output of what an `EXPLAIN` statement reveals as your logical plan? This is a good first step to getting an idea of what each operator is doing.
Degraded performance over time/after processing many records sounds like it could be a state size issue. Knowing what operators are used may help us know how much state is being stored. There are metrics around state size [1] as well as system resources [2]. Matt Cuento cuentom...@gmail.com [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#system-resources [2] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#state-size On Mon, Jun 23, 2025 at 4:25 PM Siva Ram Gujju <sivaram2...@gmail.com> wrote: > Hello, > I'm new to Flink and doing a POC. I have a Flink job which reads events > from a kafka source Topic, performs some calculations and outputs a couple > of SQL sinks. > I deployed this to a stand alone cluster running on my linux virtual > machine (all default settings). > > Parallelism=3 > NoOfTaskSlots allowed in config.yml=10 > NoOfTaskSlots required for my job=3 > Rest of the settings are default. > > The job runs fine for the first 100,000 event and the response is near > real time. After that the first operator of the job starts to show Busy > (max): 100% and the processing slows down significantly (see below picture). > Heap is at 50%. > Source Lag (kafka consumers lag) is 0. Source Kafka cluster CPU is <3%. > > > 1. How can I triage what is causing slowness? Is it a CPU or Memory issue, > how do I find it? Everything looks normal to me. No exceptions in logs. > 2. Why did the job run fine for the 100K event super fast and started > slowing down? Any theory on this? > Please suggest. Thank you! > > > [image: Picture 1, Picture] >