[jira] [Updated] (FLINK-24316) Refactor IntermediateDataSet to have only one consumer

Zhilong Hong (Jira) Fri, 17 Sep 2021 01:31:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhilong Hong updated FLINK-24316:
---------------------------------
    Description: 
Currently, IntermediateDataSet has an assumption that an IntermediateDataSet 
can be consumed by multiple consumers. However, this assumption has never came 
to reality. For an upstream vertex that is connected to multiple downstream 
vertices, it will generate multiple IntermediateDataSets. Each consumer is 
corresponding to one IntermediateDataSet.

Furthermore, there are several checks in the code to make sure that an 
IntermediateDataSet has only one consumer, like 
{{Execution#getPartitionMaxParallelism}}, 
{{SsgNetworkMemoryCalculationUtils#getMaxSubpartitionNums}}, and etc. These 
checks make the logic complicated. And it's hard to guarantee the consistency, 
because we can't make sure all the calls to {{getConsumers}} have this check in 
the future.

Since multiple consumers for IntermediateDataSet may not come true in a long 
time, we think maybe it's better to refactor IntermediateDataSet to have only 
one consumer, as the discussion mentioned in 
[https://github.com/apache/flink/pull/16856#discussion_r691878539].

If we are going to support multiple consumers for IntermediateDataSet in the 
future, we can just bring it back and refactor all the usages.

As IntermediateDataSet changes, all classes related to it should change, 
including IntermediateResult, IntermediateResultPartition, 
DefaultResultPartition, and etc. All the related sanity checks need to be 
removed, too.

  was:
Currently, IntermediateDataSet has an assumption that an IntermediateDataSet 
can be consumed by multiple consumers. However, this assumption has never came 
to reality. For an upstream vertex that is connected to multiple downstream 
vertices, it will generate multiple IntermediateDataSets. Each consumer is 
corresponding to one IntermediateDataSet.

Furthermore, there are several checks in the code to make sure that an 
IntermediateDataSet has only one consumer, like 
{{Execution#getPartitionMaxParallelism}}, 
{{SsgNetworkMemoryCalculationUtils#getMaxSubpartitionNums}}, and etc. These 
checks make the logic complicated. And it's hard to guarantee the consistency, 
because we can't make sure all the calls to {{getConsumers}} have this check in 
the future.

Since multiple consumers for IntermediateDataSet may not come true in a long 
time, we think maybe it's better to refactor IntermediateDataSet to have only 
one consumer, as the discussion mentioned in 
[https://github.com/apache/flink/pull/16856].

If we are going to support multiple consumers for IntermediateDataSet in the 
future, we can just bring it back and refactor all the usages.

As IntermediateDataSet changes, all classes related to it should change, 
including IntermediateResult, IntermediateResultPartition, 
DefaultResultPartition, and etc. All the related sanity checks need to be 
removed, too.


> Refactor IntermediateDataSet to have only one consumer
> ------------------------------------------------------
>
>                 Key: FLINK-24316
>                 URL: https://issues.apache.org/jira/browse/FLINK-24316
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Runtime / Coordination
>            Reporter: Zhilong Hong
>            Priority: Major
>             Fix For: 1.15.0
>
>
> Currently, IntermediateDataSet has an assumption that an IntermediateDataSet 
> can be consumed by multiple consumers. However, this assumption has never 
> came to reality. For an upstream vertex that is connected to multiple 
> downstream vertices, it will generate multiple IntermediateDataSets. Each 
> consumer is corresponding to one IntermediateDataSet.
> Furthermore, there are several checks in the code to make sure that an 
> IntermediateDataSet has only one consumer, like 
> {{Execution#getPartitionMaxParallelism}}, 
> {{SsgNetworkMemoryCalculationUtils#getMaxSubpartitionNums}}, and etc. These 
> checks make the logic complicated. And it's hard to guarantee the 
> consistency, because we can't make sure all the calls to {{getConsumers}} 
> have this check in the future.
> Since multiple consumers for IntermediateDataSet may not come true in a long 
> time, we think maybe it's better to refactor IntermediateDataSet to have only 
> one consumer, as the discussion mentioned in 
> [https://github.com/apache/flink/pull/16856#discussion_r691878539].
> If we are going to support multiple consumers for IntermediateDataSet in the 
> future, we can just bring it back and refactor all the usages.
> As IntermediateDataSet changes, all classes related to it should change, 
> including IntermediateResult, IntermediateResultPartition, 
> DefaultResultPartition, and etc. All the related sanity checks need to be 
> removed, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24316) Refactor IntermediateDataSet to have only one consumer

Reply via email to