On Wed, 16 Jun 2021 13:53:38 GMT, Daniel Fuchs <[email protected]> wrote:
> Hi,
>
> Please find below a test-only change to fix some intermittent failures
> observed with the httpclient/websocket tests:
> these tests intermittently and randomly fail with ENOMEM ("No buffer space
> available").
>
> Some machines in our CI seem to allow a higher level of concurrency while
> being (maybe) configured with lower system resources (such as available
> buffer space for the TCP stack).
>
> Some of the httpclient/websocket tests attempt to fill the sockets buffers in
> order to assert some conditions when the buffers are full and writing is
> paused. When the test process terminates, this leaves behind TCP sockets in
> the TIME_WAIT state that still hold system buffer resources in case
> retransmission is needed. When several such tests are run this ends up
> causing random "No buffer space available" errors on other tests (including
> these tests themselves) running concurrently or shortly after on the same
> machine.
>
> This change implements a few tricks to alleviate the situation:
> - configure the tests with smaller send buffers on the client side and
> receive buffers on the server side, in order to limit how much buffer space
> is consumed by the test.
> - when the not-reading server is closed, and before the accepted socket is
> closed, read all available data off the socket buffer in order to free up the
> buffer space that the test has consumed before closing the socket.
> - in some tests that create a large number of HttpClients, limit the number
> of clients created in shared client mode, and add a call to System.gc() and a
> small pause to give time for gc to collect the old clients which are no
> longer referenced.
>
> With these changes, I have run the HttpClient tests 200 times on the
> problematic machines without observing any failures (where previously there
> was at least a couple of failures per 50 runs). I also ran tier1 once, and
> tier2 twice and the results came clean.
>
> I am therefore claiming success (even if it might prove temporary ;-) )
>
> If these failures come back to haunt the CI again after this fix, a further
> remediation policy could be to put the httpclient/websocket directory in
> exclusive test execution mode (in TEST.root) - this seems to work too - but
> cleaning up garbage in the tests themselves seems preferable.
This pull request has now been integrated.
Changeset: 8ea0606a
Author: Daniel Fuchs <[email protected]>
URL:
https://git.openjdk.java.net/jdk17/commit/8ea0606aba15911f5bfe2c81a83b42288d97095f
Stats: 93 lines in 12 files changed: 86 ins; 0 del; 7 mod
8268714: [macos-aarch64] 7 java/net/httpclient/websocket tests failed
Reviewed-by: chegar, michaelm
-------------
PR: https://git.openjdk.java.net/jdk17/pull/79