When evaluating technical alternatives I think it’s helpful to look at data.  
Has anyone recently tried to run the entire dunit test suite in parallel w/o 
docker?  How many tests need to be changed?  IIRC, there would be non-trivial 
work in product code around statics and system properties as well.

Maybe pursuing a dual short-term / long-term approach ends up being the most 
realistic approach.  

@Jake have you tried using the testcontainer project with dunit?  Maybe it’s 
possible to use GenericContainer with an open RMI port.

Anthony


> On Jun 30, 2020, at 1:20 PM, Donal Evans <doev...@vmware.com> wrote:
> 
> +1 for fixing the tests. It'll be a lot of work, but it'll only be a lot of 
> work once, as opposed to taking on maintenance of our own custom Docker 
> plugin, which will be an ongoing effort and not at all immune from getting 
> broken again at some point in the future.
> ________________________________
> From: Jinmei Liao <jil...@vmware.com>
> Sent: Tuesday, June 30, 2020 12:28 PM
> To: dev@geode.apache.org <dev@geode.apache.org>
> Subject: Re: Us vs Docker vs Gradle vs JUnit
> 
> I would vote for fixing the tests to use gradle's normal forking. If we are 
> going to invest time and effort, let's invest in an option that can reduce 
> our dependencies
> ________________________________
> From: Jacob Barrett <jabarr...@vmware.com>
> Sent: Tuesday, June 30, 2020 11:30 AM
> To: dev@geode.apache.org <dev@geode.apache.org>
> Subject: Us vs Docker vs Gradle vs JUnit
> 
> All,
> 
> We are in a bit of a pickle. As you recall from a few years back in an effort 
> to both stabilize and parallelize integration, distributed and other 
> integration/system like test we use Docker. Many of the tests reused the same 
> ports for services which cause them to fail or interact with each other when 
> run in parallel. By using Docker to isolate a test we put a bandage on that 
> issue. The plugin overrides Gradle’s default forked runner by starting the 
> runners in Docker containers and marshaling the execution parameters to those 
> Dockerized runners.
> 
> The Docker test plugin is effectively unmaintained. The author seems content 
> on keeping it compatible with Gradle 4. We forked it to work with Gradle 5 
> and various other issues we have hit over the years. We have shared patches 
> in the past with little luck in having them merged and still its only 
> compatible with Gradle 4.8 at best. I spent some time trying to port it to 
> Gradle 6 but its going to be a larger undertaking given that Gradle 6 is 
> fully Java modules compatible. They added new members throughout to handle 
> modules in addition to class paths.
> 
> Long story short because our tests can’t be parallelized without a container 
> system we are stuck. We can’t go to JUnit 5 without updating Docker plugin 
> (potentially minor changes). We can’t go to Gradle 6 without updating the 
> Docker plugin (potentially huge changes). Being stuck is not a good place. I 
> see two paths out of this:
> 
> 1) We buckle down and fix the tests so they can run in parallel via the 
> normal forking mechanism of Gradle. I know some effort has been expended in 
> this by using our new rules for starting servers. We should need to go 
> further.
> 
> 2) Fully invest in the Docker plugin. We would need to fork this off as a 
> fully maintain sub-project of Geode. We would need to add to it support for 
> both Gradle 6 and JUnit 5.
> 
> My money is on fixing the tests. It is clear, at least from my exhaustive 
> searching, nobody in the Gradle and JUnit communities are isolating their 
> tests with containers. They are creating containers to host service for 
> system level testing, see Testcontainers project. The tests themselves run in 
> the local kernel space (not in container).
> 
> We made this push in the C++ and .NET tests, a much smaller set of tests, and 
> it works great. The framework takes care to create clusters that do not 
> interact with each other on the same host. Some things in Geode make this 
> harder than others, like http service not support ephemeral port selection, 
> and gfsh not providing machine readable output about ephemeral port 
> selections. We use port knocking to prevent the OS from assigning the port 
> ephemerally to another process. The framework knocks, opens and then closes, 
> all the ports it needs for the server/locator services and starts them 
> explicitly on those ports. Because of port recycling rules in the OS another 
> ephemeral port request won’t get those ports for some time after they are 
> closed. It's not perfect but it works. Fixing Geode to support ephemeral port 
> selection and a better reporting mechanisms for those port choices would be 
> more ideal. Also, we only start services necessary for the test, like don’t 
> start the http ports if they aren’t going to be used.
> 
> I would love some feedback and thoughts on this issue. Does anyone else see a 
> different path forward?
> 
> -Jake
> 
> 
> 
> 
> 

Reply via email to