vvcephei commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-807898796


   Hi @nicodds ,
   
   I can see your line of reasoning, but I think there must be something else 
going on there.
   
   When a task is "idling", it does not block the poll loop. Rather, in each 
iteration of the poll loop, the task pseudocode is like this:
   
   ```
   checks if it has records buffered from both inputs
     if so, carry on processing
     if not, check if the idle timeout has been exceeded
       if so, carry on processing
       if not, loop around again and maybe call poll()
   ```
   
   Therefore, I don't think task idling can make you miss your poll interval. 
My guess is that when you set the poll interval lower, it happened to be 
smaller than the amount of time it takes to complete one loop of processing 
each task. In that case, the poll would timeout, causing a rebalance.
   
   In fact, my typical advice for cases like yours is the opposite of what this 
PR says: to make sure that the task idle time is _larger_ than the poll 
interval. As Matthias mentioned, task idling is pointless unless we actually 
call poll() again at least once before the timeout. In other words, I think 
your reasoning was correct, but some other factor came into play and caused the 
rebalances.
   
   FYI, it doesn't help you right now, but I have just completed this feature, 
to be released in Kafka 3.0: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-695%3A+Further+Improve+Kafka+Streams+Timestamp+Synchronization
   
   KIP-695 will make it so that you should get the desired join behavior by 
default, without having to mess with the task idling timeout at all. But it's 
not coming until 3.0 is released. Until then, maybe you can try returning the 
poll interval to the default and instead increasing the task idle time to be 
larger than the poll interval.
   
   I hope this helps!
   -John


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to