Rui Fan created FLINK-34336:
-------------------------------

             Summary: 
AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may 
hang sometimes
                 Key: FLINK-34336
                 URL: https://issues.apache.org/jira/browse/FLINK-34336
             Project: Flink
          Issue Type: Bug
          Components: Tests
    Affects Versions: 1.19.0
            Reporter: Rui Fan
            Assignee: Rui Fan
             Fix For: 1.19.0


AutoRescalingITCase#testCheckpointRescalingWithKeyedAndNonPartitionedState may 
hang in 
waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
{color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};{color}
h2. Reason:

The job has 2 tasks(vertices), after calling updateJobResourceRequirements. The 
source parallelism isn't changed (It's parallelism) , and the FlatMapper+Sink 
is changed from  parallelism to parallelism2.

So we expect the task number should be parallelism + parallelism2 instead of 
parallelism2.

 
h2. Why it can be passed for now?

Flink 1.19 supports the scaling cooldown, and the cooldown time is 30s by 
default. It means, flink job will rescale job 30 seconds after 
updateJobResourceRequirements is called.

 

So the running tasks are old parallelism when we call 
waitForRunningTasks({color:#9876aa}restClusterClient{color}{color:#cc7832}, 
{color}jobID{color:#cc7832}, {color}parallelism2){color:#cc7832};. {color}

IIUC, it cannot be guaranteed, and it's unexpected.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to