zhijiangW commented on pull request #12912:
URL: https://github.com/apache/flink/pull/12912#issuecomment-660785430


   @curcur 
   Re 2. I did not get your point here. Can you explain it a bit?
   Re 3. this interface was changed after release-1.11, this PR is based on 
release-1.10 fix. For release-1.11 and master branch, we should submit a 
separate PR according. 
   
   > Notice that isReleased is set to true in 
RemoteChannel#releaseAllResources, which is called in the Canceler thread, but 
not in the task thread (you can verify it through the stack trace).
   
   `RemoteInputChannel#releaseAllResources` was called by canceler thread. 
After the `isReleased` state is set true, it has no effect even when task 
thread calls it again later. The deadlock is actually caused by task thread 
calling `SteamTask#cleanup`, which will release exclusive buffers while 
executing `CachedBufferStorage#close`. While recycling exclusive buffers, it 
will take the lock for ch2 and trigger ch1#notifyBufferAvailable to wait for 
lock of ch1.
   
   So as long as two threads might recycle buffer concurrently, then it would 
cause this potential deadlock. E.g. while canceler thread is releasing the 
channel to recycle received buffer, the task thread might also recycle the 
buffer meanwhile when the buffer is consumed completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to