1996fanrui opened a new pull request, #746: URL: https://github.com/apache/flink-kubernetes-operator/pull/746
Sorry for this fix! After #744 is merged, I rebased it. During developing other features, I found `StandaloneAutoscalerExecutorTest.testScaling fails occasionally due to race condition`. Here is my fork repo, it fails with : https://github.com/1996fanrui/flink-kubernetes-operator/actions/runs/7405774151/job/20149240482#step:5:22232 ``` Error: Failures: Error: StandaloneAutoscalerExecutorTest.testScaling:93 Actual and expected should have same size but actual size is: 1 while expected size is: 2 Actual was: [org.apache.flink.autoscaler.event.TestingEventCollector$Event@6b580b88, org.apache.flink.autoscaler.event.TestingEventCollector$Event@6d91790b] Expected was: [9fe5243cd23e0727710fcd0b4d423970, 9eec17388e32cb02ec205ac9f638e2fe] ``` From this log, we know the reason is race condition. The Actual was has 2 events, but when we check the actual size, it was one. ## Why it fails occasionally? https://github.com/apache/flink-kubernetes-operator/pull/744 supports multiple threads for autoscaler standalone, and the scaling control loop thread doesn't wait for the actual scaling of all jobs. So I introduced the `CountDownLatch` to let test wait all scaling are finished. But the `eventHandler.handleEvent` is called inside of the `scalingSingleJob`. We should call `countDownLatch.countDown();` after `scalingSingleJob`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
