[ 
https://issues.apache.org/jira/browse/FLINK-32119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-32119:
---------------------------------------
    Description: 
After choosing the target parallelism for a vertex, we choose a higher 
parallelism if that parallelism leads to evenly spreading the number of key 
groups (=max parallelism).

Sources don't have keyed state, so this adjustment does not make sense for key 
groups. However, we internally limit the max parallelism of sources to the 
number of partitions discovered. This prevents partition skew. 

The partition skew logic currently doesn’t work correctly when there are 
multiple topics because we use the total number of partitions discovered. Using 
a single max parallelism doesn’t yield skew free partition distribution then. 
However, this is also true for a single topic when the number of partitions is 
a prime number or a not easily divisible number. 

Hence, we should add an option to guarantee skew free partition distribution 
which means using the total number of partitions when another configuration is 
not possible. 



  was:
After choosing the target parallelism for a vertex, we choose a higher 
parallelism if that parallelism leads to evenly spreading the number of key 
groups (=max parallelism).

For one, sources don't have keyed state, so this adjustment does not make sense.

For another, we internally limit the max parallelism of sources to the number 
of partitions discovered. This makes the adjustment even less meaningful.

        Summary: Revise source partition skew logic   (was: Add option to 
control source partition skew)

> Revise source partition skew logic 
> -----------------------------------
>
>                 Key: FLINK-32119
>                 URL: https://issues.apache.org/jira/browse/FLINK-32119
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>             Fix For: kubernetes-operator-1.6.0, kubernetes-operator-1.5.1
>
>
> After choosing the target parallelism for a vertex, we choose a higher 
> parallelism if that parallelism leads to evenly spreading the number of key 
> groups (=max parallelism).
> Sources don't have keyed state, so this adjustment does not make sense for 
> key groups. However, we internally limit the max parallelism of sources to 
> the number of partitions discovered. This prevents partition skew. 
> The partition skew logic currently doesn’t work correctly when there are 
> multiple topics because we use the total number of partitions discovered. 
> Using a single max parallelism doesn’t yield skew free partition distribution 
> then. However, this is also true for a single topic when the number of 
> partitions is a prime number or a not easily divisible number. 
> Hence, we should add an option to guarantee skew free partition distribution 
> which means using the total number of partitions when another configuration 
> is not possible. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to