[ https://issues.apache.org/jira/browse/KAFKA-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855957#comment-16855957 ]
Sophie Blee-Goldman commented on KAFKA-8478: -------------------------------------------- Also related to KAFKA-7458 > Poll for more records before forced processing > ---------------------------------------------- > > Key: KAFKA-8478 > URL: https://issues.apache.org/jira/browse/KAFKA-8478 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: John Roesler > Priority: Major > > While analyzing the algorithm of Streams's poll/process loop, I noticed the > following: > The algorithm of runOnce is: > {code} > loop0: > long poll for records (100ms) > loop1: > loop2: for BATCH_SIZE iterations: > process one record in each task that has data enqueued > adjust BATCH_SIZE > if loop2 processed any records, repeat loop 1 > else, break loop1 and repeat loop0 > {code} > There's potentially an unwanted interaction between "keep processing as long > as any record is processed" and forcing processing after `max.task.idle.ms`. > If there are two tasks, A and B, and A runs out of records on one input > before B, then B could keep the processing loop running, and hence prevent A > from getting any new records, until max.task.idle.ms expires, at which point > A will force processing on its other input partition. The intent of idling is > to at least give A a chance of getting more records on the empty input, but > under this situation, we'd never even check for more records before forcing > processing. > I'm thinking we should only enforce processing if there was a completed poll > since we noticed the task was missing inputs (otherwise, we may as well not > bother idling at all). -- This message was sent by Atlassian JIRA (v7.6.3#76005)