1996fanrui opened a new pull request, #746:
URL: https://github.com/apache/flink-kubernetes-operator/pull/746

   Sorry for this fix!
   
   After #744 is merged, I rebased it. During developing other features, I 
found `StandaloneAutoscalerExecutorTest.testScaling fails occasionally due to 
race condition`.
   
   Here is my fork repo, it fails with :
   
   
https://github.com/1996fanrui/flink-kubernetes-operator/actions/runs/7405774151/job/20149240482#step:5:22232
   
   ```
   Error:  Failures: 
   Error:    StandaloneAutoscalerExecutorTest.testScaling:93 
   Actual and expected should have same size but actual size is:
     1
   while expected size is:
     2
   Actual was:
     [org.apache.flink.autoscaler.event.TestingEventCollector$Event@6b580b88,
       org.apache.flink.autoscaler.event.TestingEventCollector$Event@6d91790b]
   Expected was:
     [9fe5243cd23e0727710fcd0b4d423970, 9eec17388e32cb02ec205ac9f638e2fe]
   ```
   
   From this log, we know the reason is race condition. The Actual was has 2 
events, but when we check the actual size, it was one.
   
   ## Why it fails occasionally?
   
   
   https://github.com/apache/flink-kubernetes-operator/pull/744 supports 
multiple threads for autoscaler standalone, and the scaling control loop thread 
doesn't wait for the actual scaling of all jobs.
   
   So I introduced the `CountDownLatch` to let test wait all scaling are 
finished. But the `eventHandler.handleEvent` is called inside of the 
`scalingSingleJob`. We should call `countDownLatch.countDown();` after 
`scalingSingleJob`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to