Hi Stuart, great proposal. You can count on me when it comes to testing on exotic platforms like for example AIX:)
Regards, Volker On Thu, May 1, 2014 at 2:08 AM, Stuart Marks <stuart.ma...@oracle.com> wrote: > Hi all, > > Here's a draft JEP for stabilizing the core libraries regression test suite, > that is, fixing up the spuriously failing tests. Please review and comment. > > Thanks! > > s'marks > > > > > Title: JDK Core Libraries Test Stabilization > Author: Stuart Marks > Organization: Oracle > Discussion: core-libs-dev@openjdk.java.net > [...other metadata elided...] > > Summary > ------- > > The JDK Regression Test Suite has several thousand fully automated > tests. These tests are valuable and effective in that they serve to > prevent bugs from entering the code base. However, they suffer from > many intermittent failures. Many of these failures are "spurious" in > that they are not caused by bugs in the product. Spurious failures add > considerable noise to test reports; they make it impossible for > developers to ascertain whether a particular change has introduced a > bug; and they obscure actual failures. > > The reliability of the regression suite has improved considerably over > the past few years. However, there are perhaps still 100-200 tests > that fail intermittently, and most of these failures are spurious. > This project aims to reduce the number and frequency of spuriously > failing tests to a level where it no longer is an impediment to > development. > > This project targets tests from the regression suite that cover the > JDK Core Libraries, including base packages (java.lang, io, nio, util) > I18N, Networking, RMI, Security, and Serviceability. JAXP and CORBA > are also included, although they have relatively few regression tests > at present. > > > Non-Goals > --------- > > Regression tests for other areas, including Hotspot, Langtools, and > Client areas, are not included in this project. > > This project does not address operational issues that might cause > builds or test runs to fail or for reports not to be delivered in a > timely fashion. > > This project is not focused on product bugs that cause test > failures. Such test failures are "good" in that the test suite is > providing valid information about the product. > > Test runs on embedded platforms are not covered by this project. > > > Success Metrics > --------------- > > The reliability of a successful test run (100% pass) currently stands > at approximately 0.5%. The goal is to improve this success rate to > 98%, exclusive of true failures (i.e., those caused by bugs in the > product). At a 98% success rate, a continuous build system that runs ten > jobs per day, five days a week would have one or fewer spurious > failures per week. > > > Motivation > ---------- > > Developers are continually hampered by the unreliability of the > regression test suite. Intermittently failing tests add significant > noise to the results of every test run. The consequence is that > developers cannot tell whether test failures were caused by bugs > introduced by a recent change or whether they are spurious > failures. In addition, the intermittent failures mask actual failures > in the product, slowing development and reducing quality. Developers > should be able to rely on the test suite telling them accurate > information: test failures should indicate the introduction of a bug > into the system, and absence of test failures should be usable as > evidence that changes are correct. > > > Description > ----------- > > Spurious test failures fall into two broad categories: > > - test bugs > - environmental issues > > Our working assumption for most intermittent test failures is that > they are spurious, and further, that they are caused by bugs in the > test itself. While it is possible for a product bug to cause an > intermittent failure, this is relatively rare. The majority of > intermittent failures encountered so far have indeed proven to be test > bugs. > > "Environmental" issues, such as misconfigured test machines, temporary > dysfunction on the machine running the test job (e.g., filesystem > full), or transient network failures, also contribute to spurious > failures. Test should be made more robust, if possible. Environment > issues should be fed back to the infrastructure team for resolution > and future infrastructure improvements. > > A variety of techniques will be employed to diagnose, track, and help > develop fixes for intermittently failing tests: > > - track all test failures in JBS > - repeated test runs against the same build > - gather statistics about failure rates, # tests with bugs, and track > continuously > - investigate pathologies for common test failure modes > - develop techniques for fixing common test bugs > - develop test library code to improve commonality across tests and to > avoid typical failure modes > - add instrumentation to tests (and to the test suite) to improve > diagnosability > - exclude tests judiciously, preferably only as a last resort > - change reviews > - code inspections > > > Alternatives > ------------ > > The most likely alternative to diagnosing and fixing intermittent > failures is to aggressively exclude intermittently failing tests from > the test suite. This trades off code coverage in favor of test > reliability, adding risk of undetected bug introduction. > > > Testing > ------- > > The subject of the project is the test suite itself. The main > "testing" of the test suite is running it repeatedly in a variety of > environments, including continuous build-and-test systems, as well as > recurring "same-binary" test runs on promoted builds. This will help > flush out intermittent failures and detect newly introduced failures. > > > Risks and Assumptions > --------------------- > > We are working on a long tail of intermittent failures, which may > become increasingly frustrating as time goes on, resulting in the > project stalling out. > > New intermittent failures may be introduced or discovered more quickly > than they can be resolved. > > The main work of fixing up the tests will be spread across several > development groups. This requires good cross-group coordination and > focus. > > The culture in the development group has (mostly) been to ignore test > failures, or to find ways to cope with them. As intermittent failures > are removed, we hope to decrease the group's tolerance of test failures. > > > Dependences > ----------- > > No dependences on other JEPs or components. > > > Impact > ------ > > No impact on specific parts of the platform or product, except for > developer time and effort being spent on it, across various component > teams. > > ==========