On Tue, 2 Dec 2025 07:43:01 GMT, Jaikiran Pai <[email protected]> wrote:
> Can I please get a review of this test-only change which improves the > debuggability of the `java/rmi/server/RemoteServer/AddrInUse.java` test? > > As noted in https://bugs.openjdk.org/browse/JDK-8213699, this test fails > intermittently. The test code launches a Thread which does a > `LocateRegistry.createRegistry(port)`. The test then expects that call to > return within (an arbitrary) 10 seconds and if it doesn't, then it considers > that the test has ended up reproducing a bug which would cause a hang in the > implementation of `LocateRegistry.createRegistry(...)` method. > > The 10 seconds is a reasonable timeout, I think even for busy hosts. But we > have seen this test fail because the launched thread which does the > `LocateRegistry.createRegistry(...)` has either not started or completed > within that period. > > The changes in this PR updates that test code to remove the arbitrary 10 > second timeout and instead just wait for the launched thread to complete. If > the test doesn't complete within the configured jtreg test timeout (which by > default is 2 minutes), then the jtreg and its failure handler infrastructure > will gather the necessary thread dump and other states to help debug why the > test timed out. This should help understand such intermittent failures in > future (if it continues to fail). > > I have triggered a tier testing of this change in our CI and will run a test > repeat too. Marked as reviewed by smarks (Reviewer). Overall, good cleanup. A couple bits of history. The RMI tests are replete with these kind of ad-hoc timeouts. We've gotten rid of a lot of them over the years, but some of these are still lurking around, such as in this test. It's good to continue to remove them. One of the reasons this sort of stuff was probably added to individual tests was to facilitate running the tests standalone, that is, outsid the jtreg environment. Thus many tests didn't rely on jtreg to handle timeouts. I think we've moved away from this a long time ago. The priority needs to be reliability of tests run in CI systems, which happens many, many times every day. It's still possible, though somewhat inconvenient, to invoke a single test manually via jtreg, but I think this is the way it should be. I'd suggest renaming the `failure` field and the local variable into which its value is loaded after the thread is joined. The difficulty with "failure" is that it's ambiguous whether it means the _export_ has failed or that the _test_ has failed. (It means the former.) The field is expected to contain an ExportException, so something that suggests that would be preferable. ------------- PR Review: https://git.openjdk.org/jdk/pull/28595#pullrequestreview-3531522499 PR Comment: https://git.openjdk.org/jdk/pull/28595#issuecomment-3603353163
