[jira] [Commented] (KAFKA-9756) Refactor the main loop to process more than one record of one task at a time

ASF GitHub Bot (Jira) Wed, 25 Mar 2020 18:03:11 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067266#comment-17067266
 ]


ASF GitHub Bot commented on KAFKA-9756:
---------------------------------------

guozhangwang commented on pull request #8358: KAFKA-9756: Process more than one 
record of one task at a time
URL: https://github.com/apache/kafka/pull/8358
 
 
   1. Within a single while loop, process the tasks in AAABBBCCC instead of 
ABCABCABC. This also helps the follow-up PR to time the per-task processing 
ratio to record less time, hence less overhead.
   
   2. Add thread-level process / punctuate / poll / commit ratio metrics.
   
   3. Fixed a few issues discovered (inline commented).
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Refactor the main loop to process more than one record of one task at a time
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-9756
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9756
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>
> Our current main loop is implemented as the following:
> 1. Loop over all tasks that have records to process, each time process one 
> record at a time.
> 2. After finish processing one record from each task, check if commit / 
> punctuate / pool etc is needed.
> Because we process one record at a time from the task and then moves on to 
> the next task, we are effectively spending lots of time on context switches. 
> Maybe we can first investigate what if we just have each task to be hosted by 
> an individual thread, and see if the context switch cost is is not worse 
> already (which means our current implementation is already a baseline). If 
> that's true we can consider working on one task at a time, and see if it is 
> more efficient.
> For num.Iterations:
> 1. process one record from each of the tasks thread owns.
> 2. check if commit / punctuate / poll / etc needed.
> But in 1) above we process tasks A,B,C,A,B,C,... and effectively we are 
> introducing context switches within the thread as it needs to load the task 
> variables etc for each record processed.
> What I was thinking is to process tasks as A,A,A,B,B,B,C,C,C... so that we 
> can reduce the context switches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9756) Refactor the main loop to process more than one record of one task at a time

Reply via email to