[ 
https://issues.apache.org/jira/browse/FLINK-32119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723432#comment-17723432
 ] 

Gyula Fora commented on FLINK-32119:
------------------------------------

If we disable this we can still have problems when the data consumed by the 
sources is uneven. Lets say you have 8 partitions and you set parallelism 5, in 
that case you will have 3 source instances consuming 2 partitions and 2 
instances consuming 1 partition which would introduce a data/IO skew at the 
source.

This should also be configurable probably, and I am not even sure whether 
default should be off

> Disable key group alignment for determining source parallelism
> --------------------------------------------------------------
>
>                 Key: FLINK-32119
>                 URL: https://issues.apache.org/jira/browse/FLINK-32119
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>             Fix For: kubernetes-operator-1.6.0, kubernetes-operator-1.5.1
>
>
> After choosing the target parallelism for a vertex, we choose a higher 
> parallelism if that parallelism leads to evenly spreading the number of key 
> groups (=max parallelism).
> For one, sources don't have keyed state, so this adjustment does not make 
> sense.
> For another, we internally limit the max parallelism of sources to the number 
> of partitions discovered. This makes the adjustment even less meaningful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to