> Can you elaborate on the "why", ie, advantages of the new approach? I don't > see a big difference in the behavior? What issue does this PR address? (Just > for my understanding -- so far, to me it's neither better nor worse -- just > different.)
Sure, here's my reasoning: 1. Whenever a partition has cumulated more than num.records.per.partition (say N, default is 1000) we will always pause on that partition. And today we will immediately resume on a partition after its num. buffered records is equal to N after one of it being processed (so we know that before this record is processed, we have N+1 records). 2. What we want to achieve, is that when enforced processing is on-going, i.e. some of the partitions is empty while some others are not. In which case, we do not want to starve the partitions with no data while keep fetching and processing partitions that are not empty. More concretely there are a couple scenarios to consider: 2.a) there are no data coming in for the empty partition during that period of time even after the max.idleness has passed. In this case we will have to fetch / process the other partitions as long as they still have data during enforced processing. 2.b) there are some data coming for the empty partition but they are just in low traffic. In this case we want to give these partitions the highest possibility to be fetched among other partitions. [ Full content available at: https://github.com/apache/kafka/pull/5669 ] This message was relayed via gitbox.apache.org for [email protected]
