zhengchenyu commented on PR #4270:
URL: https://github.com/apache/hadoop/pull/4270#issuecomment-2222568457

   @goiri @slfan1989 
   Can you please review this PR? This PR has been running in our cluster for 
over 2 years. 
   
   The main reason for the deadlock described in this issue is that there are 
many types of locks in ResourceManager, and we do not have a clear rule to 
restrict the use of locks. I think this is an important topic.
   
   For me, I think we should have this rules: **We can only try to acquire the 
lock in this order: Node --> App --> AppAttempt.But the lock cannot be acquired 
in the opposite direction.**
    This PR avoids acquiring locks in the order from AppAttempt to Node, I did 
this by introducing the APP_LOG_AGG_STATUS_UPDATE event.
   
   And in addition, there is no need to use write lock protect 
`lastMemoryAggregateAllocationUpdateTime` and `lastResourceSecondsMap`, read 
lock is enough.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to