[
https://issues.apache.org/jira/browse/KAFKA-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax resolved KAFKA-6037.
------------------------------------
Resolution: Fixed
Closing is as "fixed by KAFKA-8611".
> Make Sub-topology Parallelism Tunable
> -------------------------------------
>
> Key: KAFKA-6037
> URL: https://issues.apache.org/jira/browse/KAFKA-6037
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Assignee: Levani Kokhreidze
> Priority: Major
> Labels: kip
> Fix For: 2.6.0
>
>
> Today the downstream sub-topology's parallelism (aka the number of tasks) are
> purely dependent on the upstream sub-topology's parallelism, which ultimately
> depends on the source topic's num. partitions. However this does not work
> perfectly with dynamic scaling scenarios.
> Imagine if your have a simple aggregation application, it would have two
> sub-topologies cut by the repartition topic, the first sub-topology would be
> very light as it reads from input topics and write to repartition topic based
> on the agg-key; the second sub-topology would do the actual work with the agg
> state store, etc, hence is heavy computational. Right now the first and
> second topology will always have the same number of tasks as the repartition
> topic num.partitions is defined to be the same as the source topic
> num.partitions, so to scale up we have to increase the number of input topic
> partitions.
> One way to improve on that, is to use a default large number for repartition
> topics and also allow users to override it (either through DSL code, or
> through config). Doing this different sub-topologies would have different
> number of tasks, i.e. parallelism units. In addition, users may also use this
> config to "hint" the DSL translator to NOT create the repartition topics
> (i.e. to not break up the sub-topologies) when she has the knowledge of the
> data format.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)