When evaluating technical alternatives I think it’s helpful to look at data. Has anyone recently tried to run the entire dunit test suite in parallel w/o docker? How many tests need to be changed? IIRC, there would be non-trivial work in product code around statics and system properties as well.
Maybe pursuing a dual short-term / long-term approach ends up being the most realistic approach. @Jake have you tried using the testcontainer project with dunit? Maybe it’s possible to use GenericContainer with an open RMI port. Anthony > On Jun 30, 2020, at 1:20 PM, Donal Evans <doev...@vmware.com> wrote: > > +1 for fixing the tests. It'll be a lot of work, but it'll only be a lot of > work once, as opposed to taking on maintenance of our own custom Docker > plugin, which will be an ongoing effort and not at all immune from getting > broken again at some point in the future. > ________________________________ > From: Jinmei Liao <jil...@vmware.com> > Sent: Tuesday, June 30, 2020 12:28 PM > To: dev@geode.apache.org <dev@geode.apache.org> > Subject: Re: Us vs Docker vs Gradle vs JUnit > > I would vote for fixing the tests to use gradle's normal forking. If we are > going to invest time and effort, let's invest in an option that can reduce > our dependencies > ________________________________ > From: Jacob Barrett <jabarr...@vmware.com> > Sent: Tuesday, June 30, 2020 11:30 AM > To: dev@geode.apache.org <dev@geode.apache.org> > Subject: Us vs Docker vs Gradle vs JUnit > > All, > > We are in a bit of a pickle. As you recall from a few years back in an effort > to both stabilize and parallelize integration, distributed and other > integration/system like test we use Docker. Many of the tests reused the same > ports for services which cause them to fail or interact with each other when > run in parallel. By using Docker to isolate a test we put a bandage on that > issue. The plugin overrides Gradle’s default forked runner by starting the > runners in Docker containers and marshaling the execution parameters to those > Dockerized runners. > > The Docker test plugin is effectively unmaintained. The author seems content > on keeping it compatible with Gradle 4. We forked it to work with Gradle 5 > and various other issues we have hit over the years. We have shared patches > in the past with little luck in having them merged and still its only > compatible with Gradle 4.8 at best. I spent some time trying to port it to > Gradle 6 but its going to be a larger undertaking given that Gradle 6 is > fully Java modules compatible. They added new members throughout to handle > modules in addition to class paths. > > Long story short because our tests can’t be parallelized without a container > system we are stuck. We can’t go to JUnit 5 without updating Docker plugin > (potentially minor changes). We can’t go to Gradle 6 without updating the > Docker plugin (potentially huge changes). Being stuck is not a good place. I > see two paths out of this: > > 1) We buckle down and fix the tests so they can run in parallel via the > normal forking mechanism of Gradle. I know some effort has been expended in > this by using our new rules for starting servers. We should need to go > further. > > 2) Fully invest in the Docker plugin. We would need to fork this off as a > fully maintain sub-project of Geode. We would need to add to it support for > both Gradle 6 and JUnit 5. > > My money is on fixing the tests. It is clear, at least from my exhaustive > searching, nobody in the Gradle and JUnit communities are isolating their > tests with containers. They are creating containers to host service for > system level testing, see Testcontainers project. The tests themselves run in > the local kernel space (not in container). > > We made this push in the C++ and .NET tests, a much smaller set of tests, and > it works great. The framework takes care to create clusters that do not > interact with each other on the same host. Some things in Geode make this > harder than others, like http service not support ephemeral port selection, > and gfsh not providing machine readable output about ephemeral port > selections. We use port knocking to prevent the OS from assigning the port > ephemerally to another process. The framework knocks, opens and then closes, > all the ports it needs for the server/locator services and starts them > explicitly on those ports. Because of port recycling rules in the OS another > ephemeral port request won’t get those ports for some time after they are > closed. It's not perfect but it works. Fixing Geode to support ephemeral port > selection and a better reporting mechanisms for those port choices would be > more ideal. Also, we only start services necessary for the test, like don’t > start the http ports if they aren’t going to be used. > > I would love some feedback and thoughts on this issue. Does anyone else see a > different path forward? > > -Jake > > > > >