[
https://issues.apache.org/jira/browse/FLINK-34336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Pohl updated FLINK-34336:
----------------------------------
Fix Version/s: (was: 1.20.0)
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState
> may hang sometimes
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-34336
> URL: https://issues.apache.org/jira/browse/FLINK-34336
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 1.19.0, 1.20.0
> Reporter: Rui Fan
> Assignee: Rui Fan
> Priority: Major
> Labels: pull-request-available, test-stability
> Fix For: 1.19.0
>
>
> AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState
> may hang in
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832},
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
> h2. Reason:
> The job has 2 tasks(vertices), after calling updateJobResourceRequirements.
> The source parallelism isn't changed (It's parallelism) , and the
> FlatMapper+Sink is changed from parallelism to parallelism2.
> So we expect the task number should be parallelism + parallelism2 instead of
> parallelism2.
>
> h2. Why it can be passed for now?
> Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by
> default. It means, flink job will rescale job 30 seconds after
> updateJobResourceRequirements is called.
>
> So the running tasks are old parallelism when we call
> waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832},
> {color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
> IIUC, it cannot be guaranteed, and it's unexpected.
>
> h2. How to reproduce this bug?
> [https://github.com/1996fanrui/flink/commit/ffd713e24d37db2c103e4cd4361d0cd916d0d2f6]
> * Disable the cooldown
> * Sleep for a while before waitForRunningTasks
> If so, the job running in new parallelism, so `waitForRunningTasks` will hang
> forever.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)