Rui Fan created FLINK-34336:
-------------------------------
Summary:
AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may
hang sometimes
Key: FLINK-34336
URL: https://issues.apache.org/jira/browse/FLINK-34336
Project: Flink
Issue Type: Bug
Components: Tests
Affects Versions: 1.19.0
Reporter: Rui Fan
Assignee: Rui Fan
Fix For: 1.19.0
AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may
hang in
waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832},
{color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
h2. Reason:
The job has 2 tasks(vertices), after calling updateJobResourceRequirements. The
source parallelism isn't changed (It's parallelism) , and the FlatMapper+Sink
is changed from parallelism to parallelism2.
So we expect the task number should be parallelism + parallelism2 instead of
parallelism2.
h2. Why it can be passed for now?
Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by
default. It means, flink job will rescale job 30 seconds after
updateJobResourceRequirements is called.
So the running tasks are old parallelism when we call
waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832},
{color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}
IIUC, it cannot be guaranteed, and it's unexpected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)