zhijiangW commented on pull request #12912: URL: https://github.com/apache/flink/pull/12912#issuecomment-659316266
I considered some options to resolve this issue: 1. Get ride of canceler thread completely to avoid race condition issue in SingleInputGate and respective channel, by delegating it to mailbox mechanism. It fits for the long-term goal, but involves in many changes which should be done in future in separate ticket. 2. Introduces somehow `BufferListener#isReleased` interface method or explicitly remove listeners from `LocalBufferPool` if the respective channel is released. It might bring some load for interface and add complex for release procedure. 3. The current PR way to allow notifying available buffer to the released channel, then the channel will check the state out of synchronized firstly to exit immediately. Regarding the verify, I can not reproduce this issue locally via the reported `StreamFaultToleranceTestBase`. I can also supplement a unit test to verify it if necessary. I remember that there was a discussion about whether it is necessary to bring unit tests like `RemoteInputChannel#testConcurrentRecycleAndRelease` to verify the concurrent issue, and the conclusion seems to rely on existing ITCase if possible, so I do not write new unit tests in this case. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
