+1 for fixing the tests. It'll be a lot of work, but it'll only be a lot of 
work once, as opposed to taking on maintenance of our own custom Docker plugin, 
which will be an ongoing effort and not at all immune from getting broken again 
at some point in the future.
________________________________
From: Jinmei Liao <jil...@vmware.com>
Sent: Tuesday, June 30, 2020 12:28 PM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Re: Us vs Docker vs Gradle vs JUnit

I would vote for fixing the tests to use gradle's normal forking. If we are 
going to invest time and effort, let's invest in an option that can reduce our 
dependencies
________________________________
From: Jacob Barrett <jabarr...@vmware.com>
Sent: Tuesday, June 30, 2020 11:30 AM
To: dev@geode.apache.org <dev@geode.apache.org>
Subject: Us vs Docker vs Gradle vs JUnit

All,

We are in a bit of a pickle. As you recall from a few years back in an effort 
to both stabilize and parallelize integration, distributed and other 
integration/system like test we use Docker. Many of the tests reused the same 
ports for services which cause them to fail or interact with each other when 
run in parallel. By using Docker to isolate a test we put a bandage on that 
issue. The plugin overrides Gradle’s default forked runner by starting the 
runners in Docker containers and marshaling the execution parameters to those 
Dockerized runners.

The Docker test plugin is effectively unmaintained. The author seems content on 
keeping it compatible with Gradle 4. We forked it to work with Gradle 5 and 
various other issues we have hit over the years. We have shared patches in the 
past with little luck in having them merged and still its only compatible with 
Gradle 4.8 at best. I spent some time trying to port it to Gradle 6 but its 
going to be a larger undertaking given that Gradle 6 is fully Java modules 
compatible. They added new members throughout to handle modules in addition to 
class paths.

Long story short because our tests can’t be parallelized without a container 
system we are stuck. We can’t go to JUnit 5 without updating Docker plugin 
(potentially minor changes). We can’t go to Gradle 6 without updating the 
Docker plugin (potentially huge changes). Being stuck is not a good place. I 
see two paths out of this:

1) We buckle down and fix the tests so they can run in parallel via the normal 
forking mechanism of Gradle. I know some effort has been expended in this by 
using our new rules for starting servers. We should need to go further.

2) Fully invest in the Docker plugin. We would need to fork this off as a fully 
maintain sub-project of Geode. We would need to add to it support for both 
Gradle 6 and JUnit 5.

My money is on fixing the tests. It is clear, at least from my exhaustive 
searching, nobody in the Gradle and JUnit communities are isolating their tests 
with containers. They are creating containers to host service for system level 
testing, see Testcontainers project. The tests themselves run in the local 
kernel space (not in container).

We made this push in the C++ and .NET tests, a much smaller set of tests, and 
it works great. The framework takes care to create clusters that do not 
interact with each other on the same host. Some things in Geode make this 
harder than others, like http service not support ephemeral port selection, and 
gfsh not providing machine readable output about ephemeral port selections. We 
use port knocking to prevent the OS from assigning the port ephemerally to 
another process. The framework knocks, opens and then closes, all the ports it 
needs for the server/locator services and starts them explicitly on those 
ports. Because of port recycling rules in the OS another ephemeral port request 
won’t get those ports for some time after they are closed. It's not perfect but 
it works. Fixing Geode to support ephemeral port selection and a better 
reporting mechanisms for those port choices would be more ideal. Also, we only 
start services necessary for the test, like don’t start the http ports if they 
aren’t going to be used.

I would love some feedback and thoughts on this issue. Does anyone else see a 
different path forward?

-Jake





Reply via email to