Hi,

As part of a project that I'm developing, I'm extending Flink 1.2 to
support load shedding. I'm doing some performance tests to check the
performance impact of my changes compared to Flink 1.2 release.

>From the results that I'm getting, I can see that load shedding is working
and that incoming events are being dropped (the lag in the input Kafka
topic also remains ~0).

But when I look at the latency in Flink, it seems that when load shedding
triggers, the latency starts  growing to values above 5 seconds (I don't
see the same behavior on Flink 1.2. release). Before load shedding
triggers, the latency remains similar.

Looking at the git diff with the changes that I did on the application
runtime side, there's only one that is in the critical path of the
processing pipeline. In the RecordWriter.emit
<https://github.com/apache/flink/blob/066b66dd37030bb94bd179d7fb2280876be038c5/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java#L105>
I simply add a condition that randomly skips the step of sending the record
to the target (sendToTarget) with a given probability (dropProbability <
random.nextDouble()).

Does adding those random drops have side effects on some other component in
Flink, causing the latencies to increase?

Thanks,
Luís Alves

Reply via email to