[
https://issues.apache.org/jira/browse/FLINK-32119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maximilian Michels updated FLINK-32119:
---------------------------------------
Description:
After choosing the target parallelism for a vertex, we choose a higher
parallelism if that parallelism leads to evenly spreading the number of key
groups (=max parallelism).
Sources don't have keyed state, so this adjustment does not make sense for key
groups. However, we internally limit the max parallelism of sources to the
number of partitions discovered. This prevents partition skew.
The partition skew logic currently doesn’t work correctly when there are
multiple topics because we use the total number of partitions discovered. Using
a single max parallelism doesn’t yield skew free partition distribution then.
However, this is also true for a single topic when the number of partitions is
a prime number or a not easily divisible number.
Hence, we should add an option to guarantee skew free partition distribution
which means using the total number of partitions when another configuration is
not possible.
was:
After choosing the target parallelism for a vertex, we choose a higher
parallelism if that parallelism leads to evenly spreading the number of key
groups (=max parallelism).
For one, sources don't have keyed state, so this adjustment does not make sense.
For another, we internally limit the max parallelism of sources to the number
of partitions discovered. This makes the adjustment even less meaningful.
Summary: Revise source partition skew logic (was: Add option to
control source partition skew)
> Revise source partition skew logic
> -----------------------------------
>
> Key: FLINK-32119
> URL: https://issues.apache.org/jira/browse/FLINK-32119
> Project: Flink
> Issue Type: Bug
> Components: Autoscaler, Kubernetes Operator
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Fix For: kubernetes-operator-1.6.0, kubernetes-operator-1.5.1
>
>
> After choosing the target parallelism for a vertex, we choose a higher
> parallelism if that parallelism leads to evenly spreading the number of key
> groups (=max parallelism).
> Sources don't have keyed state, so this adjustment does not make sense for
> key groups. However, we internally limit the max parallelism of sources to
> the number of partitions discovered. This prevents partition skew.
> The partition skew logic currently doesn’t work correctly when there are
> multiple topics because we use the total number of partitions discovered.
> Using a single max parallelism doesn’t yield skew free partition distribution
> then. However, this is also true for a single topic when the number of
> partitions is a prime number or a not easily divisible number.
> Hence, we should add an option to guarantee skew free partition distribution
> which means using the total number of partitions when another configuration
> is not possible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)