[ 
https://issues.apache.org/jira/browse/KAFKA-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9756.
----------------------------------
    Fix Version/s: 2.6.0
       Resolution: Fixed

> Refactor the main loop to process more than one record of one task at a time
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-9756
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9756
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>             Fix For: 2.6.0
>
>
> Our current main loop is implemented as the following:
> 1. Loop over all tasks that have records to process, each time process one 
> record at a time.
> 2. After finish processing one record from each task, check if commit / 
> punctuate / pool etc is needed.
> Because we process one record at a time from the task and then moves on to 
> the next task, we are effectively spending lots of time on context switches. 
> Maybe we can first investigate what if we just have each task to be hosted by 
> an individual thread, and see if the context switch cost is is not worse 
> already (which means our current implementation is already a baseline). If 
> that's true we can consider working on one task at a time, and see if it is 
> more efficient.
> For num.Iterations:
> 1. process one record from each of the tasks thread owns.
> 2. check if commit / punctuate / poll / etc needed.
> But in 1) above we process tasks A,B,C,A,B,C,... and effectively we are 
> introducing context switches within the thread as it needs to load the task 
> variables etc for each record processed.
> What I was thinking is to process tasks as A,A,A,B,B,B,C,C,C... so that we 
> can reduce the context switches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to