On Fri, 25 Nov 2022 12:15:47 GMT, Daniel Fuchs <[email protected]> wrote:

>> The java/net/httpclient/SpecialHeadersTest.java test has been observed 
>> failing once, due to a selector thread still running after all operations 
>> had terminated. This change slightly alter the test logic to check the 
>> shared client only at the end of the test when all methods have been 
>> executed, and focussed the checks performed by a test method to the only 
>> clients that this method is using. This may or may not prevent the observed 
>> issue to ever reproducing, but should at least provide more information on 
>> where the selector manager thread is at the time it is observed running, 
>> should the test fail again:
>> the logic of the HttpClient should have woken up the selector when the last 
>> operation finished, which should have ensured a prompt termination of the 
>> thread.
>> In addition this change fixes a small race condition in the logic that 
>> attempts to decide if the selector is still alive: the selector should be 
>> considered alive until its run() method has been called and has exited (or 
>> Thread::start has thrown).
>
> Daniel Fuchs has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains five additional 
> commits since the last revision:
> 
>  - Merge branch 'master' into SpecialHeadersTest-8297200
>  - Update test/jdk/java/net/httpclient/ReferenceTracker.java
>    
>    Co-authored-by: Andrey Turbanov <[email protected]>
>  - Fixed race condition in detecting if selector is still alive
>  - 8297200
>  - 8297200

Hello Daniel, I've been looking at this proposed change, the current test and 
the failure logs that I have access to. The changes being proposed here at a 
high level are as follows:

1. Keeping track of whether or not the selector thread was started successfully 
(through `Thread.start()` call). Using that information to (only) decide 
whether a selector is active or not in reference tracking implementation.
2. Cleaning up the (internal) implementation of the (test code) 
`ReferenceTracker`
3. Updating the test to check for references per client tracker instead of all 
trackers.

Doing `#2` and `#3` changes I think are fine (I added some review comments for 
those but those can be addressed in a straightforward way).

`#1` is where I am having more and more questions. But before I get to those, I 
was looking at the test log run that failed and I couldn't find any crucial 
debug logs that our HttpClient generates. That would have given some hint on 
whether or not the selector manager thread shutdown was initiated. Looks like 
the `SpecialHeadersTest` doesn't set the `jdk.internal.httpclient.debug`. That 
also means that our logic in the `beforeMethod` to stop on first failure ended 
up being a no-op. Perhaps we should add that system property and wait for some 
runs to fail to capture the necessary details?

Coming to the failing `testHomeMadeIllegalHeader` test - this test issue a 
request whose `HttpRequest` instance is of a custom class and includes a 
disallowed request header. The test then calls `client.send(req, 
BodyHandlers.ofString());` and expects that to fail with a 
`IllegalArgumentException`. And looking at the logs, it does fail with this 
exception. We then set the `client = null`, run a GC and waits for 500 millis 
for all references and operations to finish. But apparently this test fails 
because the selector manager is still alive. The selector manager can only be 
alive if the facade is non-null (it is null from the logs) or pending operation 
count is more than 0. In this specific test the pending operation count will 
always be 0 since the request failed very early in the `send` operation and we 
never increment the pending operation count at that time. So that means that 
the selector manager thread would have been woken up and it would have started 
the shutdown process. The 
 `shutdown` process in the selector manager can be expensive since it closes 
all connections, but in this specific test there should be no connections at 
all since the request generation itself fails very early. So perhaps that means 
the selector manager thread has just started (and marked itself as alive) but 
hasn't yet reached the `selector.select` statement when the tracker checked if 
it was alive? Looking at the code that seems possible and perhaps explains why 
shortly later the tracker in its logging noted that the selector manager is no 
longer alive. This would then mean that the tracker is (and will always?) be 
inherently racy, isn't it?

Coming to the part `#1` change - so it appears that we now maintain a state if 
the selector manager thread doesn't start (i.e. fails in the rare case where 
`Thread.start()` throws Exception). We consider the selector manager to be 
alive in such cases. But this "aliveness" is only used for reference tracking. 
I think, if the selector manager thread has failed to start, that would mean 
the `HttpClient` instance cannot function, isn't it? But as far as I can see it 
wouldn't start throwing up errors but would silently accumulate requests (I'm 
using the term `requests` loosely here, but I mean the internal `AsyncEvent` 
instances). Did I read the code correctly? If so, should we be somehow 
propagating this thread start failure as actual exceptions whenever requests 
are issued?

-------------

PR: https://git.openjdk.org/jdk/pull/11294

Reply via email to