On 07/10/2025 20:31, Coty Sutherland wrote:
Hi all,
Before I started implementation and submitted a PR, I wanted to share a
proposal for some new test targets in the build.xml. The idea is to
introduce structured test categories to improve developer productivity and
CI efficiency through faster, more targeted test execution without the need
to know or use the various fileset patterns.
I'm far more concerned about developer productivity than I am CI usage.
As a project, we make relatively very little usage of CI and are under
no pressure to reduce that usage.
Generally:
Test duration varies so much depending on the resources available, I
don't think it makes a good component of the definition of a test set. I
suggest using a % of a complete test run on the same hardware with the
same number of test threads. E.g the smoke test runs in 5% of the time
required for the full test suite.
I don't think the test definitions should be defining things like
running in parallel. There are some CI environments that we use that
don't have 6 cores. The test definitions should define the tests and the
degree of parallelism controlled with test.threads set appropriately for
the environment.
Currently, we use GitHub actions for the sanity check and BuildBot for
the full test run. I see no reason to change that although we should
check that we are using appropriate values for test.threads on both
platforms. No objection to reviewing what we include in the sanity check.
*New Test Targets*
* ant smoke-test - Runs fast smoke tests (~30 seconds) that verify basic
functionality across all major Tomcat components including server startup,
core engine, etc. Tests essential class loading and API availability.
* ant test-quick - Runs unit tests and critical integration tests (~5
minutes) for development validation. Excludes integration scenarios,
performance tests, and complex deployment tests.
I'm not sure I see the need for both of these.
* ant test-components - Runs full component testing (~20 minutes) with unit
tests for specific components in parallel (6 components; each with their
own test target). Excludes cross-component integration tests.
* ant test-integration - Runs cross-component integration tests (~30
minutes) including WebSocket, SSL/TLS, clustering, session management,
authentication, valves, filters, startup lifecycle, and JSP-servlet
integration.
I think it might be hard to draw a definitive line between component and
integration. I tend to look at these as different levels of granularity.
If Dimitris's idea of a single test target with a parameter to specify
the set(s) of tests to run is possible that could be really good.
Especially if it handled overlapping sets and only ran a test once.
* ant test-performance - Runs performance tests for benchmarking and
optimization, including timing-sensitive code paths, memory usage, and
throughput tests. Isolated from other categories to prevent flaky failures.
Do we have any flaky performance tests at the moment?
The performance tests tend to fall into one of two categories. Those
that compare more than one way of doing the same thing and confirm that
the current Tomcat implementation is using the fastest. And those that
just provide raw numbers for a given operation. I think the former need
to stay in the complete test suite. The latter are (or should be)
already excluded.
* ant test-tribes-system - Runs comprehensive clustering system tests (30+
minutes) for the Tribes clustering component. These are high-resource,
long-running integration scenarios that were previously excluded from the
main test suite due to regular failures, now available for thorough
validation when working on clustering functionality.
Tribes is just another component.
The frequency of the regular failures is low enough that (as far as I
recall) none of the tribes tests are currently excluded from the full
test run.
Note: times mentioned above are guesstimates based on running with a few
test threads since I haven't implemented anything yet.
*Component-Specific Test Targets*
* ant test-component-catalina - Runs all Catalina tests
* ant test-component-coyote - Runs all Coyote tests
* ant test-component-jasper - Runs all JSP tests
* ant test-component-el - Runs all EL tests
* ant test-component-tomcat - Runs all Tomcat utilities, WebSocket,
logging, and JNDI tests
* ant test-component-servlet - Runs all servlet tests
*Key Benefits*
For Developers:
- 30-second smoke tests feedback vs 15-30 minute full suite (depending on
available test threads)
A target for a quick test would be useful. I tend to just do a build.
- Run only relevant component tests for the systems you're working on
I can see the benefits to developers of having test targets that cover
particular functionality. Generally, I run all the tests in a package
through the IDE but there are certainly times when I need to run tests
in several packages that having a single test target would be helpful.
- More obvious test targets with specific purpose
- Quick validation with a shorter feedback loop before commits
For CI:
- Shorter test runs across multiple jobs/platforms to reduce costs (if
there were any)
- Replace 10-20 minute "smoketest" with 30-second validation (or 5 min
quick tests) for faster builds and a notable decrease in compute time used
for every commit
- Avoids the need to update the ci.yml to exclude new tests that may cause
longer runtimes
*Implementation*
The implementation of this plan would follow existing conventions to
utilize the same JVM args, properties, and exclude patterns as current
runtests macro preserving compatibility with the current test workflows.
There wouldn't be any change to existing test targets, only the new ones
introduced. The only new change to the suite would be the addition of the
SmokeTest designation in filenames if we wanted to include new tests for
that target; everything else is just creating targets from existing
filesets/patterns.
I'm not at all a fan of defining inclusion in the smoke tests by file
name. I'd much rather see that group of tests defined by being
explicitly listed in build.xml.
Thoughts? If there aren't any objections I'll start working on a PR :D
It seems that developers are likely to have much more powerful machines
than most of the CI systems we use. Should that factor into our
thinking? Does it actually change anything?
Finally, when running with lots of threads the individual time taken by
a test can dominate the timing (e.g. a long running test that starts
near then end can continue long after all the other threads have
stopped). As we increase the number of test.threads we use both in CI
and locally, we might want to look at those long running tests and see
if we can break them up.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]