zhijiangW commented on pull request #12912:
URL: https://github.com/apache/flink/pull/12912#issuecomment-659316266


   I considered some options to resolve this issue:
   
   1. Get ride of canceler thread completely to avoid race condition issue in 
SingleInputGate and respective channel, by delegating it to mailbox mechanism. 
It fits for the long-term goal, but involves in many changes which should be 
done in future in separate ticket.
   
   2. Introduces somehow `BufferListener#isReleased` interface method or 
explicitly remove listeners from `LocalBufferPool` if the respective channel is 
released. It might bring some load for interface and add complex for release 
procedure.
   
   3. The current PR way to allow notifying available buffer to the released 
channel, then the channel will check the state out of synchronized firstly to 
exit immediately. 
   
   Regarding the verify, I can not reproduce this issue locally via the 
reported `StreamFaultToleranceTestBase`. I can also supplement a unit test to 
verify it if necessary. I remember that there was a discussion about whether it 
is necessary to bring unit tests like 
`RemoteInputChannel#testConcurrentRecycleAndRelease` to verify the concurrent 
issue, and the conclusion seems to rely on existing ITCase if possible, so I do 
not write new unit tests in this case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to