[ 
https://issues.apache.org/jira/browse/SAMZA-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Liu updated SAMZA-863:
----------------------------
    Description: 
Currently a samza container executes the tasks sequentially in a single thread. 
For example, we have message 1 and 2 in the pending queue for task 1 and task 
2. Task 1 will process message 1, and until its completion task 2 can process 
message 2. If we want to handle more messages in parallel, we have to increase 
the container count, e.g. from 1 to 2 in the example.

While this solution has been working for many CPU-bound job scenarios, we do 
see its drawback for IO-bound jobs.In this kind of jobs, the task makes 
IO/Network requests, i.e, db calls, rest calls or external service RPC calls. 
These IO calls significantly slow down the task processing. We can increase 
container number in order to parallelize the IO calls, but it results in low 
CPU utilization. If we can improve CPU utilization by allocating multiple 
contains in the same CPU core, it will still cause dramatic memory growth due 
to the memory being allocated for each container.

To better scale the performance of IO-bound jobs, we are proposing to support 
multi-threaded processing in samza. The design proposal will come soon.

  was:
Currently a samza container executes the tasks sequentially in a single thread. 
For example, we have message 1 and 2 in the pending queue for task 1 and task 
2. Task 1 will process message 1, and util its completion task 2 can process 
message 2. If we want to handle more messages in parallel, we have to increase 
the container count, e.g. from 1 to 2 in the example.

While this solution has been working for many CPU-bound job scenarios, we do 
see its drawback for IO-bound jobs.In this kind of jobs, the task makes 
IO/Network requests, i.e, db calls, rest calls or external service RPC calls. 
These IO calls significantly slow down the task processing. We can increase 
container number in order to parallelize the IO calls, but it results in low 
CPU utilization. If we can improve CPU utilization by allocating multiple 
contains in the same CPU core, it will still cause dramatic memory growth due 
to the memory being allocated for each container.

To better scale the performance of IO-bound jobs, we are proposing to support 
multi-threaded processing in samza. The design proposal will come soon.


> Support multi-threading in samza tasks
> --------------------------------------
>
>                 Key: SAMZA-863
>                 URL: https://issues.apache.org/jira/browse/SAMZA-863
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Xinyu Liu
>            Assignee: Xinyu Liu
>
> Currently a samza container executes the tasks sequentially in a single 
> thread. For example, we have message 1 and 2 in the pending queue for task 1 
> and task 2. Task 1 will process message 1, and until its completion task 2 
> can process message 2. If we want to handle more messages in parallel, we 
> have to increase the container count, e.g. from 1 to 2 in the example.
> While this solution has been working for many CPU-bound job scenarios, we do 
> see its drawback for IO-bound jobs.In this kind of jobs, the task makes 
> IO/Network requests, i.e, db calls, rest calls or external service RPC calls. 
> These IO calls significantly slow down the task processing. We can increase 
> container number in order to parallelize the IO calls, but it results in low 
> CPU utilization. If we can improve CPU utilization by allocating multiple 
> contains in the same CPU core, it will still cause dramatic memory growth due 
> to the memory being allocated for each container.
> To better scale the performance of IO-bound jobs, we are proposing to support 
> multi-threaded processing in samza. The design proposal will come soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to