azagrebin opened a new pull request #11667: [FLINK-15936] Harden TaskExecutorTest#testSlotAcceptance URL: https://github.com/apache/flink/pull/11667 ## What is the purpose of the change ### Concurrent `TaskSlotTable` in test The test called `taskSlotTable.allocateSlot` from the test main thread, concurrently with `taskSlotTable.createSlotReport `while trying to register RM in the main TM thread. This silently failed the RM registration in `TM.runAsync`. As a result, RM.notifySlotAvailable was not called in TM. The taskSlotTable is not thread-safe and must be accessed only from the main RPC thread of TM. ### Failure in `establishResourceManagerConnection` Failure of `TaskExecutor#establishResourceManagerConnection` is not expected. It completely breaks the connection mechanism to RM in TM. As a hotfix, the PR suggests to log it on error level at least. Alternatively, we can consider calling `TM.onFatalError` as a follow-up. ## Brief change log The PR refactors the test to wait properly for RM registration and allocate slots through gateway in TM thread. The PR also uses proper testing RM/JM instead of mocks. ## Verifying this change To verify, the test has been looped locally 30k times.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
