[
https://issues.apache.org/jira/browse/SAMZA-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lakshmi Manasa Gaduputi updated SAMZA-2687:
-------------------------------------------
Description:
Problem: Throughput via parallelism is tied to the number of tasks which is
equal to the partition count of input streams. If a job is facing lag and is
already at the max container count = number of tasks = number of input
partitions, then the only choice it is left with is to repartition the input.
The need for this arises due to the process-time of the job’s logic which is
not under Samza’s control.
Solution: Proposed approach is to allow consumption of a portion of the
partition (SystemStreamPartition) by an elastic task. Elastic task is the same
as a task except that it consumes sub-ssp.
SEP to follow shortly
was:
Problem: Throughput via parallelism is tied to the number of tasks which is
equal to the partition count of input streams. If a job is facing lag and is
already at the max container count = number of tasks = number of input
partitions, then the only choice it is left with is to repartition the input.
The need for this arises due to the process-time of the job’s logic which is
not under Samza’s control.
Solution: Proposed approach is to allow consumption of a portion of the
partition (SystemStreamPartition) by a virtual task. Virtual task is the same
as a task except that it consumes sub-ssp.
SEP to follow shortly
> Elasticity: scale up task count beyond the input partition count.
> -----------------------------------------------------------------
>
> Key: SAMZA-2687
> URL: https://issues.apache.org/jira/browse/SAMZA-2687
> Project: Samza
> Issue Type: New Feature
> Reporter: Lakshmi Manasa Gaduputi
> Assignee: Lakshmi Manasa Gaduputi
> Priority: Major
>
> Problem: Throughput via parallelism is tied to the number of tasks which is
> equal to the partition count of input streams. If a job is facing lag and is
> already at the max container count = number of tasks = number of input
> partitions, then the only choice it is left with is to repartition the input.
> The need for this arises due to the process-time of the job’s logic which is
> not under Samza’s control.
> Solution: Proposed approach is to allow consumption of a portion of the
> partition (SystemStreamPartition) by an elastic task. Elastic task is the
> same as a task except that it consumes sub-ssp.
> SEP to follow shortly
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)