[ 
https://issues.apache.org/jira/browse/FLINK-33968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lijie Wang updated FLINK-33968:
-------------------------------
    Description: 
Currently, when using dynamic graphs, the subpartition-num of a task is lazily 
calculated until the task deployment moment, this may lead to some 
uncertainties in job recovery scenarios:

Before jm crashs, when deploying upstream tasks, the parallelism of downstream 
vertex may be unknown, so the subpartiton-num will be the max parallelism of 
downstream job vertex. However, after jm restarts, when deploying upstream 
tasks, the parallelism of downstream job vertex may be known(has been 
calculated before jm crashs and been recovered after jm restarts), so the 
subpartiton-num will be the actual parallelism of downstream job vertex. The 
difference of calculated subpartition-num will lead to the partitions generated 
before jm crashs cannot be reused after jm restarts.

We will solve this problem by advancing the calculation of subpartitoin-num to 
the moment of initializing executon job vertex (in ctor of 
IntermediateResultPartition)

  was:
Currently, when using dynamic graphs, the subpartition-num of a task is lazily 
calculated until the task deployment moment, this may lead to some 
uncertainties in job recovery scenarios.

Before jm crashs, when deploying upstream tasks, the parallelism of downstream 
vertex may be unknown, so the subpartiton-num will be the max parallelism of 
downstream job vertex. However, after jm restarts, when deploying upstream 
tasks, the parallelism of downstream job vertex may be known(has been 
calculated before jm crashs and been recovered after jm restarts), so the 
subpartiton-num will be the actual parallelism of downstream job vertex.
 
The difference of calculated subpartition-num will lead to the partitions 
generated before jm crashs cannot be reused after jm restarts.

We will solve this problem by advancing the calculation of subpartitoin-num to 
the moment of initializing executon job vertex (in ctor of 
IntermediateResultPartition)


> Compute the number of subpartitions when initializing executon job vertices
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-33968
>                 URL: https://issues.apache.org/jira/browse/FLINK-33968
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Lijie Wang
>            Assignee: Lijie Wang
>            Priority: Major
>
> Currently, when using dynamic graphs, the subpartition-num of a task is 
> lazily calculated until the task deployment moment, this may lead to some 
> uncertainties in job recovery scenarios:
> Before jm crashs, when deploying upstream tasks, the parallelism of 
> downstream vertex may be unknown, so the subpartiton-num will be the max 
> parallelism of downstream job vertex. However, after jm restarts, when 
> deploying upstream tasks, the parallelism of downstream job vertex may be 
> known(has been calculated before jm crashs and been recovered after jm 
> restarts), so the subpartiton-num will be the actual parallelism of 
> downstream job vertex. The difference of calculated subpartition-num will 
> lead to the partitions generated before jm crashs cannot be reused after jm 
> restarts.
> We will solve this problem by advancing the calculation of subpartitoin-num 
> to the moment of initializing executon job vertex (in ctor of 
> IntermediateResultPartition)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to