[
https://issues.apache.org/jira/browse/SAMZA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinyu Liu updated SAMZA-863:
----------------------------
Attachment: DESIGN-SAMZA-863-2.pdf
> Support multi-threading in samza tasks
> --------------------------------------
>
> Key: SAMZA-863
> URL: https://issues.apache.org/jira/browse/SAMZA-863
> Project: Samza
> Issue Type: New Feature
> Reporter: Xinyu Liu
> Assignee: Xinyu Liu
> Attachments: DESIGN-SAMZA-863-0.pdf, DESIGN-SAMZA-863-1.pdf,
> DESIGN-SAMZA-863-2.pdf
>
>
> Currently a samza container executes the tasks sequentially in a single
> thread. For example, we have message 1 and 2 in the pending queue for task 1
> and task 2. Task 1 will process message 1, and until its completion task 2
> can process message 2. If we want to handle more messages in parallel, we
> have to increase the container count, e.g. from 1 to 2 in the example.
> While this solution has been working for many CPU-bound job scenarios, we do
> see its drawback for IO-bound jobs.In this kind of jobs, the task makes
> IO/Network requests, i.e, db calls, rest calls or external service RPC calls.
> These IO calls significantly slow down the task processing. We can increase
> container number in order to parallelize the IO calls, but it results in low
> CPU utilization. If we can improve CPU utilization by allocating multiple
> contains in the same CPU core, it will still cause dramatic memory growth due
> to the memory being allocated for each container.
> To better scale the performance of IO-bound jobs, we are proposing to support
> multi-threaded processing in samza. The design proposal will come soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)