> Can you elaborate on the "why", ie, advantages of the new approach? I don't 
> see a big difference in the behavior? What issue does this PR address? (Just 
> for my understanding -- so far, to me it's neither better nor worse -- just 
> different.)

Sure, here's my reasoning:

1. Whenever a partition has cumulated more than num.records.per.partition (say 
N, default is 1000) we will always pause on that partition. And today we will 
immediately resume on a partition after its num. buffered records is equal to N 
after one of it being processed (so we know that before this record is 
processed, we have N+1 records).

2. What we want to achieve, is that when enforced processing is on-going, i.e. 
some of the partitions is empty while some others are not. In which case, we do 
not want to starve the partitions with no data while keep fetching and 
processing partitions that are not empty. More concretely there are a couple 
scenarios to consider:

2.a) there are no data coming in for the empty partition during that period of 
time even after the max.idleness has passed. In this case we will have to fetch 
/ process the other partitions as long as they still have data during enforced 
processing.

2.b) there are some data coming for the empty partition but they are just in 
low traffic. In this case we want to give these partitions the highest 
possibility to be fetched among other partitions.

[ Full content available at: https://github.com/apache/kafka/pull/5669 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to