azagrebin opened a new pull request #11667: [FLINK-15936] Harden 
TaskExecutorTest#testSlotAcceptance
URL: https://github.com/apache/flink/pull/11667
 
 
   ## What is the purpose of the change
   
   ### Concurrent  `TaskSlotTable` in test
   
   The test called `taskSlotTable.allocateSlot` from the test main thread,
   concurrently with `taskSlotTable.createSlotReport `while trying to register
   RM in the main TM thread. This silently failed the RM registration in 
`TM.runAsync`.
   As a result, RM.notifySlotAvailable was not called in TM.
   The taskSlotTable is not thread-safe and must be accessed only from the main 
RPC thread of TM.
   
   ### Failure in `establishResourceManagerConnection`
   
   Failure of `TaskExecutor#establishResourceManagerConnection` is not expected.
   It completely breaks the connection mechanism to RM in TM.
   As a hotfix, the PR suggests to log it on error level at least.
   Alternatively, we can consider calling `TM.onFatalError` as a follow-up.
   
   ## Brief change log
   
   The PR refactors the test to wait properly for RM registration
   and allocate slots through gateway in TM thread.
   The PR also uses proper testing RM/JM instead of mocks.
   
   ## Verifying this change
   
   To verify, the test has been looped locally 30k times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to