paul-rogers opened a new pull request #12368: URL: https://github.com/apache/druid/pull/12368
## Description [Issue #12359](https://github.com/apache/druid/issues/12359) proposes an approach to simplify and streamline integration tests, especially around the developer experience, but also for Travis. See that issue for the background. This PR is big, but most of that comes from creating revised versions of existing files. Unfortunately, there is no good way using GitHub to compare two copies of the same file. For the most part, these are config files and you can assume that the new versions work (because, when they didn't, the cluster stubbornly refused to start or stay up.) ### Developer Experience With this framework, it is possible to: * Do a normal distribution build. * Build the Docker image in less than a minute. (Most of that is Maven determining what not to do. After the first build, you can use a script to rebuild in a few seconds, depending on what Docker must rebuild.) * Launch the cluster in a few seconds. * Debug an integration test as a JUnit test in your favorite IDE. The result is that integration tests go from being a nightmare to being an efficient way to develop and test code changes. This author used it to create tests for [PR #12222](https://github.com/apache/druid/pull/12222). The process was quick and easy. Not as efficient as just using unit tests (we still want the single-process server), but still pretty good. (By contrast, the new tests were ported to the existing framework, and that is still difficult for the reasons we're trying to address here.) One huge win is that, with this approach, one can start a Docker cluster and leave it up indefinitely to try out APIs, to create or refactor tests, etc. Though there are many details to get right to use Docker and Docker Compose, once those are addressed, using the cluster becomes quite simple and productive. ### Contents of this First Cut This PR is a first draft of the approach which provides: * A new top-level project, `docker-tests` that holds the new integration test structure. (For now, the existing `integration-tests` is left unchanged.) * Sub-project `testing-tools` to hold code placed into the Docker image. * Sub-project `test-image` to build the Druid-only test image from the tarball produced in `distribution`. (Dependencies live in their "official" image.) * Sub-project `base-test` that holds the test code common to the revised integration tests, including file-based test configuration, test-specific clients, test initialization and updated version of some of the common test support classes. * Sub-project `high-availability` which is port of the integration test `high-availability` group to demonstrate and exercise the new structure. The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. [An extensive set of "readme" files](https://github.com/paul-rogers/druid/blob/220325-docker/docker-tests/README.md) explain those details. Rather than repeat that material here, please consult those files for explanations. ### Limitations This version is very much a first cut. Everything works for the one converted test group. The new framework is intended to exist parallel to the current one so we can get started. The new framework is ignored unless you select the Maven profiles which enable it. (See the docs for details.) There are *many* other test groups not yet touched. A good approach is to use this framework for new integration tests, and to convert old ones when someone needs to modify them. The cost of converting to this framework is low, and the productivity gain is large. Other limitations include: * The original tests appear to run not only in Docker, but also against a local QuickStart cluster and against Kubernetes. Neither of these other two modes have been tested in the new framework. (Though, it is now so easy to start and use a Docker cluster that that it may be easier to use Docker than the QuickStart cluster.) * The original tests always have security enabled. While it is important to test security, having security enabled makes debugging far harder (by design.) So, this draft has security disabled. The various scripts and configs are pulled aside. The thought is to enable security as an option when needed, and run without it when debugging things other than the security mechanism. * The supporting classes have the basics, but have been used for only the one integration test group. * This framework is not yet integrated into Travis. A test that exists only in the new framework won't run in the Travis build. We hope to address that limitation after this PR is merged. ### Next Steps This PR itself will continue to evolve as some of the final details are sorted out. However, it is at the stage where it will benefit from others taking a look and making suggestions. The thought is that this PR is large enough already: let's get it reviewed, then tackle the additional issues listed above as the opportunity arrises and step-by-step. ## Alternatives The approach in the PR is based on the existing approach, but re-arranges the parts. Since the integration test are pretty much "nothing but details", there are many approaches that could be taken. Here are a few that were considered. * Run the tests as-is in an AWS instance. Because the tests are very difficult to run on a developer machine, many folks set up an AWS instance to run them. While this can work, it is slow: one has to shuffle code from the laptop to the instance and back. Or, just do development on the instance. The tests are not really set up for debugging, so even on the instance, it is still tedious to make and debug test changes. * Run the tests in Travis as part of a PR. This is the default approach. However, it is akin to the development process of old: submit the changes to a batch run, wait many hours for the answers, plow though the logs, find issues, fix them, and repeat. That process was not efficient in the era of punch cards, and is still not very efficient today. A turnaround of a minute or less is the garget, which Travis approach cannot provide. * Modify the existing integration tests. This is the obvious approach. But, the set of existing ITs is so large that attempting to change everything in one go becomes overwhelming. The chosen approach allows incremental test-by-test conversion without breaking the great mass of existing tests. * Status-quo. I'm working on a project that requires many integration tests. It is faster to fix the test framework once, and do the tests quickly, than to fight with the framework for each of the required tests. That said, this PR is all about details. Your thoughts, suggestions and corrections are encouraged to ensure we've got our bases covered. ## Detailed Changes A number of specific changes are worth calling out that do not appear in the docs. * A number of config classes are modified to allow the test code to create an instance directly, without the "JSON Config" mechanism. * The MySQL, ZK and Kafka extensions are slightly modified to expose methods that allow us to use the classes in test-specific clients. * The Guice `Initialization` class is modified to allow tests to define a client-oriented set of modules. (The standard configuration assumes a server environment, and thus has far more dependencies than we need for tests which act as clients.) * Similarly, `Lifecycle` is slightly modified to allow the use in a test client. <hr> This PR has: - [X] been self-reviewed. - [X] added documentation for new or modified features or behaviors. - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [X] added integration tests. - [X] been tested in a test Druid cluster (in the sense that this PR is for running such a cluster in Docker.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
