[
https://issues.apache.org/jira/browse/FLINK-32119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723432#comment-17723432
]
Gyula Fora commented on FLINK-32119:
------------------------------------
If we disable this we can still have problems when the data consumed by the
sources is uneven. Lets say you have 8 partitions and you set parallelism 5, in
that case you will have 3 source instances consuming 2 partitions and 2
instances consuming 1 partition which would introduce a data/IO skew at the
source.
This should also be configurable probably, and I am not even sure whether
default should be off
> Disable key group alignment for determining source parallelism
> --------------------------------------------------------------
>
> Key: FLINK-32119
> URL: https://issues.apache.org/jira/browse/FLINK-32119
> Project: Flink
> Issue Type: Bug
> Components: Autoscaler, Kubernetes Operator
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Fix For: kubernetes-operator-1.6.0, kubernetes-operator-1.5.1
>
>
> After choosing the target parallelism for a vertex, we choose a higher
> parallelism if that parallelism leads to evenly spreading the number of key
> groups (=max parallelism).
> For one, sources don't have keyed state, so this adjustment does not make
> sense.
> For another, we internally limit the max parallelism of sources to the number
> of partitions discovered. This makes the adjustment even less meaningful.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)