xinglin commented on PR #6519: URL: https://github.com/apache/hadoop/pull/6519#issuecomment-1922564371
> Why doesn't the OOM cause the client to fail with the existing code on trunk, i.e. where is the OOM suppressed? It is not suppressed/captured at all: it caused the Connection thread to crash. That is why we don't see Connection thread in our thread dump. > After your fix, what error will the client fail with? I'm worried that by suppressing this OOM (due to thread creation) we will end up with an OOM elsewhere and it won't be easily to trace when we have two many open connections. I made slight change to my PR, to capture this exception but also throw the exception after we do some cleanup and remove this Connection object from IPC.client.connections pool. So, the original code would keep the bad Connection object around when the Connection thread crashes (because it does not call close() method). The new code would remove that bad connection object and a new good one will be created next time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
