[ https://issues.apache.org/jira/browse/KAFKA-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang resolved KAFKA-9756. ---------------------------------- Fix Version/s: 2.6.0 Resolution: Fixed > Refactor the main loop to process more than one record of one task at a time > ---------------------------------------------------------------------------- > > Key: KAFKA-9756 > URL: https://issues.apache.org/jira/browse/KAFKA-9756 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Priority: Major > Fix For: 2.6.0 > > > Our current main loop is implemented as the following: > 1. Loop over all tasks that have records to process, each time process one > record at a time. > 2. After finish processing one record from each task, check if commit / > punctuate / pool etc is needed. > Because we process one record at a time from the task and then moves on to > the next task, we are effectively spending lots of time on context switches. > Maybe we can first investigate what if we just have each task to be hosted by > an individual thread, and see if the context switch cost is is not worse > already (which means our current implementation is already a baseline). If > that's true we can consider working on one task at a time, and see if it is > more efficient. > For num.Iterations: > 1. process one record from each of the tasks thread owns. > 2. check if commit / punctuate / poll / etc needed. > But in 1) above we process tasks A,B,C,A,B,C,... and effectively we are > introducing context switches within the thread as it needs to load the task > variables etc for each record processed. > What I was thinking is to process tasks as A,A,A,B,B,B,C,C,C... so that we > can reduce the context switches. -- This message was sent by Atlassian Jira (v8.3.4#803005)