[ 
https://issues.apache.org/jira/browse/FLINK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-14362:
----------------------------
    Description: 
Currently {{DefaultSchedulingResultPartition#getState()}} returns the state of 
partitions based on the partition producer's state. The state is used to make 
scheduling decision.

However, it does not correctly reflect the true state of a partition.
For example, when producer task turns to RUNNING but has not produced any data 
yet, it's consumers should not be scheduled to reduce unnecessary resource cost 
in lazy scheduling mode. However, the partition state will be RUNNING in 
{{DefaultSchedulingResultPartition}} and will trigger the scheduling of its 
consumers. This may lead to some vertices scheduled earlier than expected with 
no data to consume, which means a waste of resources.

I'd propose to change the enums in {{ResultPartitionState}} to be:
* CREATED // partition is just created or is just reset
* CONSUMABLE // pipelined partition has data produced or blocking partition's 
parent result finishes. Corresponds to IntermediateResultPartition#isConsumable.

* The CONSUMABLE state is what the scheduler really cares to make scheduling 
decisions.


  was:
Currently {{DefaultSchedulingResultPartition#getState()}} returns the state of 
partitions based on the partition producer's state. The state is used to make 
scheduling decision.

However, it does not correctly reflect the true state of a partition.
For example, when producer task turns to RUNNING but has not produced any data 
yet, it's consumers should not be scheduled to reduce unnecessary resource cost 
in lazy scheduling mode. However, the partition state will be RUNNING in 
{{DefaultSchedulingResultPartition}} and will trigger the scheduling of its 
consumers.

I'd propose to change the enums in {{ResultPartitionState}} to be:
* CREATED // partition is just created or is just reset
* CONSUMABLE // pipelined partition has data produced or blocking partition's 
parent result finishes. Corresponds to IntermediateResultPartition#isConsumable.

The CONSUMABLE state is what the scheduler cares to make scheduling decisions.

This change also relieves 
LazyFromSourcesSchedulingStrategy/InputDependencyConstraintChecker from 
partition states management.



> Change DefaultSchedulingResultPartition to return correct partition state
> -------------------------------------------------------------------------
>
>                 Key: FLINK-14362
>                 URL: https://issues.apache.org/jira/browse/FLINK-14362
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Assignee: Zhu Zhu
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently {{DefaultSchedulingResultPartition#getState()}} returns the state 
> of partitions based on the partition producer's state. The state is used to 
> make scheduling decision.
> However, it does not correctly reflect the true state of a partition.
> For example, when producer task turns to RUNNING but has not produced any 
> data yet, it's consumers should not be scheduled to reduce unnecessary 
> resource cost in lazy scheduling mode. However, the partition state will be 
> RUNNING in {{DefaultSchedulingResultPartition}} and will trigger the 
> scheduling of its consumers. This may lead to some vertices scheduled earlier 
> than expected with no data to consume, which means a waste of resources.
> I'd propose to change the enums in {{ResultPartitionState}} to be:
> * CREATED // partition is just created or is just reset
> * CONSUMABLE // pipelined partition has data produced or blocking partition's 
> parent result finishes. Corresponds to 
> IntermediateResultPartition#isConsumable.
> * The CONSUMABLE state is what the scheduler really cares to make scheduling 
> decisions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to