[PR] [hotfix][autoscaler] Fix the StandaloneAutoscalerExecutorTest.testScaling fails occasionally due to race condition [flink-kubernetes-operator]

via GitHub Wed, 03 Jan 2024 22:36:18 -0800


1996fanrui opened a new pull request, #746:
URL: https://github.com/apache/flink-kubernetes-operator/pull/746

Sorry for this fix!

After #744 is merged, I rebased it. During developing other features, I
found `StandaloneAutoscalerExecutorTest.testScaling fails occasionally due to
race condition`.

Here is my fork repo, it fails with :

https://github.com/1996fanrui/flink-kubernetes-operator/actions/runs/7405774151/job/20149240482#step:5:22232

```
Error: Failures:
Error: StandaloneAutoscalerExecutorTest.testScaling:93
Actual and expected should have same size but actual size is:
1
while expected size is:
2
Actual was:
[org.apache.flink.autoscaler.event.TestingEventCollector$Event@6b580b88,
org.apache.flink.autoscaler.event.TestingEventCollector$Event@6d91790b]
Expected was:
[9fe5243cd23e0727710fcd0b4d423970, 9eec17388e32cb02ec205ac9f638e2fe]
```

From this log, we know the reason is race condition. The Actual was has 2
events, but when we check the actual size, it was one.

## Why it fails occasionally?

https://github.com/apache/flink-kubernetes-operator/pull/744 supports
multiple threads for autoscaler standalone, and the scaling control loop thread
doesn't wait for the actual scaling of all jobs.

So I introduced the `CountDownLatch` to let test wait all scaling are
finished. But the `eventHandler.handleEvent` is called inside of the
`scalingSingleJob`. We should call `countDownLatch.countDown();` after
`scalingSingleJob`.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [hotfix][autoscaler] Fix the StandaloneAutoscalerExecutorTest.testScaling fails occasionally due to race condition [flink-kubernetes-operator]

Reply via email to