[GitHub] [druid] jasonk000 commented on pull request #12303: Kafka & Kinesis stream ingest parsing in parallel

via GitHub Fri, 03 Feb 2023 09:48:19 -0800


jasonk000 commented on PR #12303:
URL: https://github.com/apache/druid/pull/12303#issuecomment-1416201807


   > Do you have a performance report that shows how this change improves the 
throughput by setting different values of thread count under same incoming 
message rate?
   
   Yes. This patch has a penalty when configured to use `parsingThreadCount=1` 
but, for 2 or higher, throughput is improved!
   
   
![image](https://user-images.githubusercontent.com/3196528/216670466-73788f13-1719-4f7e-9bc9-7fe45c2c2e0b.png)
   
   
   parsingThreadCount | ingested rows / 4mins |  speedup
   -- | -- | --
   1 | 52515 | 84%
   2 | 64926 | 104%
   3 | 74105 | 119%
   4 | 80810 | 130%
   5 | 86054 | 138%
   6 | 89136 | 143%
   7 | 94213 | 151%
   8 | 95680 | 153%
   pre-patch | 62372 | 100%
   
   This change moves the performance impact mostly to (1) kafka ingestion flow 
and (2) to index row generation
   
   focused only on the task-runner thread:
   
   before:
   
![image](https://user-images.githubusercontent.com/3196528/216671908-50852ad4-32f5-4767-9a64-2ace84c67eac.png)
   
   after:, notice the purple `parseWithInputFormat` is moved to this thread
   
![image](https://user-images.githubusercontent.com/3196528/216671932-66b41c80-b9de-4368-bd0f-300ba652c2bb.png)
   
   So, this makes the next bottleneck be the remainder of that loop, and any 
future improvements to index generator will scale up with N threads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jasonk000 commented on pull request #12303: Kafka & Kinesis stream ingest parsing in parallel

Reply via email to