[ 
https://issues.apache.org/jira/browse/KAFKA-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-6037:
----------------------------------

    Assignee: Boyang Chen

> Make Sub-topology Parallelism Tunable
> -------------------------------------
>
>                 Key: KAFKA-6037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Boyang Chen
>            Priority: Major
>              Labels: needs-kip
>
> Today the downstream sub-topology's parallelism (aka the number of tasks) are 
> purely dependent on the upstream sub-topology's parallelism, which ultimately 
> depends on the source topic's num. partitions. However this does not work 
> perfectly with dynamic scaling scenarios.
> Imagine if your have a simple aggregation application, it would have two 
> sub-topologies cut by the repartition topic, the first sub-topology would be 
> very light as it reads from input topics and write to repartition topic based 
> on the agg-key; the second sub-topology would do the actual work with the agg 
> state store, etc, hence is heavy computational. Right now the first and 
> second topology will always have the same number of tasks as the repartition 
> topic num.partitions is defined to be the same as the source topic 
> num.partitions, so to scale up we have to increase the number of input topic 
> partitions.
> One way to improve on that, is to use a default large number for repartition 
> topics and also allow users to override it (either through DSL code, or 
> through config). Doing this different sub-topologies would have different 
> number of tasks, i.e. parallelism units. In addition, users may also use this 
> config to "hint" the DSL translator to NOT create the repartition topics 
> (i.e. to not break up the sub-topologies) when she has the knowledge of the 
> data format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to