noorall opened a new pull request, #25552:
URL: https://github.com/apache/flink/pull/25552
## What is the purpose of the change
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to
implement a balanced distribution algorithm based on the amount of data and the
number of subpartitions, however it also has some limitations:
1. Currently, Decider selects the data distribution algorithm via the
AllToAll or Pointwise attribute of the input, which limits the ability of the
operator to dynamically modify the data distribution algorithm.
2. Doesn't support data volume-based balanced distribution for Pointwise
inputs.
3. For AllToAll type inputs, it does not support splitting the data
corresponding to the specific key, i.e., it cannot solve the data skewing
caused by single-key hotspot.
For that we plan to introduce the following improvements:
1. Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the
input characterisation which allows the operator to flexibly choose the data
balanced distribution algorithm.
2. Introducing a data volume-based data balanced distribution algorithm for
Pointwise inputs
3. Introducing the ability to split data corresponding to the specific key
to optimise AllToAll's data volume-based data balancing distribution algorithm.
## Brief change log
- Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation.
- Introducing amount-based data balanced distribution algorithm for
Pointwise.
- Introducing the ability to split data corresponding to the specific key
for AllToAll
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs
/ not documented)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]