Zhilong Hong created FLINK-24316:
------------------------------------
Summary: Refactor IntermediateDataSet to have only one consumer
Key: FLINK-24316
URL: https://issues.apache.org/jira/browse/FLINK-24316
Project: Flink
Issue Type: Technical Debt
Components: Runtime / Coordination
Reporter: Zhilong Hong
Fix For: 1.15.0
Currently, IntermediateDataSet has an assumption that an IntermediateDataSet
can be consumed by multiple consumers. However, this assumption has never came
to reality. For an upstream vertex that is connected to multiple downstream
vertices, it will generate multiple IntermediateDataSets. Each consumer is
corresponding to one IntermediateDataSet.
Furthermore, there are several checks in the code to make sure that an
IntermediateDataSet has only one consumer, like
{{Execution#getPartitionMaxParallelism}},
{{SsgNetworkMemoryCalculationUtils#getMaxSubpartitionNums}}, and etc. These
checks make the logic complicated. And it's hard to guarantee the consistency,
because we can't make sure all the calls to {{getConsumers}} have this check in
the future.
Since multiple consumers for IntermediateDataSet may not come true in a long
time, we think maybe it's better to refactor IntermediateDataSet to have only
one consumer, as the discussion mentioned in
[https://github.com/apache/flink/pull/16856].
If we are going to support multiple consumers for IntermediateDataSet in the
future, we can just bring it back and refactor all the usages.
As IntermediateDataSet changes, all classes related to it should change,
including IntermediateResult, IntermediateResultPartition,
DefaultResultPartition, and etc. All the related sanity checks need to be
removed, too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)