hfutatzhanghb commented on PR #5330:
URL: https://github.com/apache/hadoop/pull/5330#issuecomment-1414734286

   > Thanks for the PR @hfutatzhanghb. Curious if you have any thread dumps or 
logs collected (before coming to this conclusion) and would like to share reg 
the issue.
   
   hi, @virajjasani . Thanks for your replying. Some logs are like below:
   First, we add some logs in 
`BPServiceActor.CommandProcessingThread#processCommand`:
   
   
![image](https://user-images.githubusercontent.com/25115709/216499334-66fb3f87-05c8-4baa-b2c1-5f8bba58e7b4.png)
   
   and we grep some logs as below:
   
   
![image](https://user-images.githubusercontent.com/25115709/216498739-db2b23c4-765d-4d54-b23f-428947454914.png)
   
   we can draw a conclusion that the execution time of processCommandFromActor 
method is very high, even more than 119 seconds.   And in 
processCommandFromActor method, it uses the write lock which is the same one as 
updateActorStatesFromHeartbeat method used.  The updateActorStatesFromHeartbeat 
method is in offerService method, so this could hang the hearbeat thread.
   
   
![image](https://user-images.githubusercontent.com/25115709/216500941-268a7eab-8988-4ebc-b455-481f6fa850b8.png)
   
   In our production cluster, we have use this feature, it works well. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to