xinglin commented on PR #6519:
URL: https://github.com/apache/hadoop/pull/6519#issuecomment-1922564371

   > Why doesn't the OOM cause the client to fail with the existing code on 
trunk, i.e. where is the OOM suppressed?
   
   It is not suppressed/captured at all: it caused the Connection thread to 
crash. That is why we don't see Connection thread in our thread dump. 
   
   > After your fix, what error will the client fail with? I'm worried that by 
suppressing this OOM (due to thread creation) we will end up with an OOM 
elsewhere and it won't be easily to trace when we have two many open 
connections.
   
   I made slight change to my PR, to capture this exception but also throw the 
exception after we do some cleanup and remove this Connection object from 
IPC.client.connections pool. 
   
   So, the original code would keep the bad Connection object around when the 
Connection thread crashes (because it does not call close() method). The new 
code would remove that bad connection object and a new good one will be created 
next time. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to