paul-rogers opened a new pull request #12368:
URL: https://github.com/apache/druid/pull/12368


   ## Description
   
   [Issue #12359](https://github.com/apache/druid/issues/12359) proposes an 
approach to simplify and streamline integration tests, especially around the 
developer experience, but also for Travis. See that issue for the background.
   
   This PR is big, but most of that comes from creating revised versions of 
existing files. Unfortunately, there is no good way using GitHub to compare two 
copies of the same file. For the most part, these are config files and you can 
assume that the new versions work (because, when they didn't, the cluster 
stubbornly refused to start or stay up.)
   
   ### Developer Experience
   
   With this framework, it is possible to:
   
   * Do a normal distribution build.
   * Build the Docker image in less than a minute. (Most of that is Maven 
determining what not to do. After the first build, you can use a script to 
rebuild in a few seconds, depending on what Docker must rebuild.)
   * Launch the cluster in a few seconds.
   * Debug an integration test as a JUnit test in your favorite IDE.
   
   The result is that integration tests go from being a nightmare to being an 
efficient way to develop and test code changes. This author used it to create 
tests for [PR #12222](https://github.com/apache/druid/pull/12222). The process 
was quick and easy. Not as efficient as just using unit tests (we still want 
the single-process server), but still pretty good. (By contrast, the new tests 
were ported to the existing framework, and that is still difficult for the 
reasons we're trying to address here.)
   
   One huge win is that, with this approach, one can start a Docker cluster and 
leave it up indefinitely to try out APIs, to create or refactor tests, etc. 
Though there are many details to get right to use Docker and Docker Compose, 
once those are addressed, using the cluster becomes quite simple and productive.
   
   ### Contents of this First Cut
   
   This PR is a first draft of the approach which provides:
   
   * A new top-level project, `docker-tests` that holds the new integration 
test structure. (For now, the existing `integration-tests` is left unchanged.)
   * Sub-project `testing-tools` to hold code placed into the Docker image.
   * Sub-project `test-image` to build the Druid-only test image from the 
tarball produced in `distribution`. (Dependencies live in their "official" 
image.)
   * Sub-project `base-test` that holds the test code common to the revised 
integration tests, including file-based test configuration, test-specific 
clients, test initialization and updated version of some of the common test 
support classes.
   * Sub-project `high-availability` which is port of the integration test 
`high-availability` group to demonstrate and exercise the new structure.
   
   The integration test setup is primarily a huge mass of details. This 
approach refactors many of those details: from how the image is built and 
configured to how the Docker Compose scripts are structured to test 
configuration. [An extensive set of "readme" 
files](https://github.com/paul-rogers/druid/blob/220325-docker/docker-tests/README.md)
 explain those details. Rather than repeat that material here, please consult 
those files for explanations.
   
   ### Limitations
   
   This version is very much a first cut. Everything works for the one 
converted test group. The new framework is intended to exist parallel to the 
current one so we can get started. The new framework is ignored unless you 
select the Maven profiles which enable it. (See the docs for details.)
   
   There are *many* other test groups not yet touched. A good approach is to 
use this framework for new integration tests, and to convert old ones when 
someone needs to modify them. The cost of converting to this framework is low, 
and the productivity gain is large.
   
   Other limitations include:
   
   * The original tests appear to run not only in Docker, but also against a 
local QuickStart cluster and against Kubernetes. Neither of these other two 
modes have been tested in the new framework. (Though, it is now so easy to 
start and use a Docker cluster that that it may be easier to use Docker than 
the QuickStart cluster.)
   * The original tests always have security enabled. While it is important to 
test security, having security enabled makes debugging far harder (by design.) 
So, this draft has security disabled. The various scripts and configs are 
pulled aside. The thought is to enable security as an option when needed, and 
run without it when debugging things other than the security mechanism.
   * The supporting classes have the basics, but have been used for only the 
one integration test group.
   * This framework is not yet integrated into Travis. A test that exists only 
in the new framework won't run in the Travis build. We hope to address that 
limitation after this PR is merged.
   
   ### Next Steps
   
   This PR itself will continue to evolve as some of the final details are 
sorted out. However, it is at the stage where it will benefit from others 
taking a look and making suggestions.
   
   The thought is that this PR is large enough already: let's get it reviewed, 
then tackle the additional issues listed above as the opportunity arrises and 
step-by-step.
   
   ## Alternatives
   
   The approach in the PR is based on the existing approach, but re-arranges 
the parts. Since the integration test are pretty much "nothing but details", 
there are many approaches that could be taken. Here are a few that were 
considered.
   
   * Run the tests as-is in an AWS instance. Because the tests are very 
difficult to run on a developer machine, many folks set up an AWS instance to 
run them. While this can work, it is slow: one has to shuffle code from the 
laptop to the instance and back. Or, just do development on the instance. The 
tests are not really set up for debugging, so even on the instance, it is still 
tedious to make and debug test changes.
   * Run the tests in Travis as part of a PR. This is the default approach. 
However, it is akin to the development process of old: submit the changes to a 
batch run, wait many hours for the answers, plow though the logs, find issues, 
fix them, and repeat. That process was not efficient in the era of punch cards, 
and is still not very efficient today. A turnaround of a minute or less is the 
garget, which Travis approach cannot provide.
   * Modify the existing integration tests. This is the obvious approach. But, 
the set of existing ITs is so large that attempting to change everything in one 
go becomes overwhelming. The chosen approach allows incremental test-by-test 
conversion without breaking the great mass of existing tests.
   * Status-quo. I'm working on a project that requires many integration tests. 
It is faster to fix the test framework once, and do the tests quickly, than to 
fight with the framework for each of the required tests.
   
   That said, this PR is all about details. Your thoughts, suggestions and 
corrections are encouraged to ensure we've got our bases covered.
   
   ## Detailed Changes
   
   A number of specific changes are worth calling out that do not appear in the 
docs.
   
   * A number of config classes are modified to allow the test code to create 
an instance directly, without the "JSON Config" mechanism.
   * The MySQL, ZK and Kafka extensions are slightly modified to expose methods 
that allow us to use the classes in test-specific clients.
   * The Guice `Initialization` class is modified to allow tests to define a 
client-oriented set of modules. (The standard configuration assumes a server 
environment, and thus has far more dependencies than we need for tests which 
act as clients.)
   * Similarly, `Lifecycle` is slightly modified to allow the use in a test 
client.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [X] added integration tests.
   - [X] been tested in a test Druid cluster (in the sense that this PR is for 
running such a cluster in Docker.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to