[
https://issues.apache.org/jira/browse/KAFKA-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959306#comment-16959306
]
Sophie Blee-Goldman commented on KAFKA-4969:
--------------------------------------------
Bill's proposal has already been merged, so you should see a better
distribution of tasks of the same subtopology/topicGroupId as long as you've
upgraded to a version containing this fix (not sure what version this first
went in to, any idea [~bbejeck]?)
This ticket was reopened to cover "true" state-aware assignment that
distinguishes between heavier stateful subtopologies and lighter stateless
ones, which should be covered by the work being planned as part of KIP-441 – if
you're interested you can read up on it here
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams]
> State-store workload-aware StreamsPartitionAssignor
> ---------------------------------------------------
>
> Key: KAFKA-4969
> URL: https://issues.apache.org/jira/browse/KAFKA-4969
> Project: Kafka
> Issue Type: Sub-task
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Bill Bejeck
> Priority: Major
> Fix For: 1.1.0
>
>
> Currently, {{StreamPartitionsAssigner}} does not distinguish different
> "types" of tasks. For example, task can be stateless of have one or multiple
> stores.
> This can lead to an suboptimal task placement: assume there are 2 stateless
> and 2 stateful tasks and the app is running with 2 instances. To share the
> "store load" it would be good to place one stateless and one stateful task
> per instance. Right now, there is no guarantee about this, and it can happen,
> that one instance processed both stateless tasks while the other processes
> both stateful tasks.
> We should improve {{StreamPartitionAssignor}} and introduce "task types"
> including a cost model for task placement. We should consider the following
> parameters:
> - number of stores
> - number of sources/sinks
> - number of processors
> - regular task vs standby task
> - in the case of standby tasks, which tasks have progressed the most with
> respect to restoration
> This improvement should be backed by a design document in the project wiki
> (no KIP required though) as it's a fairly complex change.
>
> There have been some additional discussions around task assignment on a
> related PR https://github.com/apache/kafka/pull/5390
--
This message was sent by Atlassian Jira
(v8.3.4#803005)