zhijiangW commented on pull request #12912: URL: https://github.com/apache/flink/pull/12912#issuecomment-660785430
@curcur Re 2. I did not get your point here. Can you explain it a bit? Re 3. this interface was changed after release-1.11, this PR is based on release-1.10 fix. For release-1.11 and master branch, we should submit a separate PR according. > Notice that isReleased is set to true in RemoteChannel#releaseAllResources, which is called in the Canceler thread, but not in the task thread (you can verify it through the stack trace). `RemoteInputChannel#releaseAllResources` was called by canceler thread. After the `isReleased` state is set true, it has no effect even when task thread calls it again later. The deadlock is actually caused by task thread calling `SteamTask#cleanup`, which will release exclusive buffers while executing `CachedBufferStorage#close`. While recycling exclusive buffers, it will take the lock for ch2 and trigger ch1#notifyBufferAvailable to wait for lock of ch1. So as long as two threads might recycle buffer concurrently, then it would cause this potential deadlock. E.g. while canceler thread is releasing the channel to recycle received buffer, the task thread might also recycle the buffer meanwhile when the buffer is consumed completely. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org