[ 
https://issues.apache.org/jira/browse/SAMZA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Liu updated SAMZA-863:
----------------------------
    Attachment: DESIGN-SAMZA-863-3.pdf

> Support multi-threading in samza tasks
> --------------------------------------
>
>                 Key: SAMZA-863
>                 URL: https://issues.apache.org/jira/browse/SAMZA-863
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Xinyu Liu
>            Assignee: Xinyu Liu
>         Attachments: DESIGN-SAMZA-863-0.pdf, DESIGN-SAMZA-863-1.pdf, 
> DESIGN-SAMZA-863-2.pdf, DESIGN-SAMZA-863-3.pdf
>
>
> Currently a samza container executes the tasks sequentially in a single 
> thread. For example, we have message 1 and 2 in the pending queue for task 1 
> and task 2. Task 1 will process message 1, and until its completion task 2 
> can process message 2. If we want to handle more messages in parallel, we 
> have to increase the container count, e.g. from 1 to 2 in the example.
> While this solution has been working for many CPU-bound job scenarios, we do 
> see its drawback for IO-bound jobs.In this kind of jobs, the task makes 
> IO/Network requests, i.e, db calls, rest calls or external service RPC calls. 
> These IO calls significantly slow down the task processing. We can increase 
> container number in order to parallelize the IO calls, but it results in low 
> CPU utilization. If we can improve CPU utilization by allocating multiple 
> contains in the same CPU core, it will still cause dramatic memory growth due 
> to the memory being allocated for each container.
> To better scale the performance of IO-bound jobs, we are proposing to support 
> multi-threaded processing in samza. The design proposal will come soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to