curcur edited a comment on pull request #12912:
URL: https://github.com/apache/flink/pull/12912#issuecomment-660627891


   1. Yes, I like Option 1. It looks very strange that two threads are trying 
to release the network resources at the same time.
   
   2. Wondering whether releasing the input channel in the same order in both 
places would resolve the problem as well, but not sure how difficult in this 
case.
   
   3. The code change seems to based on an old interface, so may need some 
adjustments I gues. The new one is like this:
       `public void notifyBufferAvailable(int numAvailableBuffers) throws 
IOException ...`
   
   4. From purely reading, I think the current solution **should** be able to 
resolve the deadlock. But I have to admit I do not understand the details of 
how the exact notifications of available buffers work.  
   
   Notice that `isReleased` is set to `true` in 
RemoteChannel#releaseAllResources, which is called in the `Canceler thread`, 
but not in the task thread (you can verify it through the stack trace).
   
   For the deadlock,
   Case1:
   Canceler thread, set ch1 released, grab the bufferQueue lock for ch1, 
waiting for the bufferQueue lock of ch2 for notification.
   
   Task thread, grab bufferQueue lock for ch2, get notification from ch1 
because ch1 has already set released
   
   This won't cause deadlock.
   
   Case2:
   Task thread, grab bufferQueue lock for ch2, grab bufferQueue lock for ch1 
before ch1 is set released by the Canceler thread.
   
   the Canceler is able to set releaseAllResources, but not able to grab ch1's 
lock.
   
   This won't cause deadlock as well.
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to