Hi Team, I am looking for an effective back pressure solution in Kafka Please find below my use case and detail.
Use case: I need to run “some execution” when receiving a Kafka record. Some execution could be an external API call, And it is possible that sometimes this API may not perform well. Considering we already have API timeout configured. Now, let’s say there is some degradation in external API, and batch processing will start taking much time and overall it may breach max.poll.internal.ms and hence rebalancing. This would have a ripple effect on other consumers in the same group. As-is: Currently, Kafka consumer provide the capability to pause/resume, but it may not be very effective to back-pressure the flow. 1. Ideally We may need to just slow down, not exactly pause the consumption itself. 2. If we pause the consumer, then we would have to remember to resume it. *Proposal?* To be able to auto recover from this, we can have some control algorithm (ex PID) in place, which will adjust *max.poll.records* dynamically and we can avoid rebalancing. And I see that PID (proportional integral derivative) is being referenced in many other places as well like: PIDRateEstimator <https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/rate/PIDRateEstimator.scala> , tuning-spark-back-pressure-by-simulation <https://richardstartin.github.io/posts/tuning-spark-back-pressure-by-simulation> kafka-spark-consumer <https://github.com/dibbhatt/kafka-spark-consumer>. Currently, the Kafka consumer doesn't allow dynamic change of the max poll records setting, But the proposal is to allow passing in this config dynamically. I am not an expert and may be thinking in an orthogonal direction. Please advice. Regards, Vipul