Re: [DISCUSS] Test Suite Changes for 3.3.0

Stephen Mallette Fri, 07 Oct 2016 15:21:16 -0700

Thought I'd keep this thread warm a bit. If you've built the TinkerPop repo
recently, you would realize it's taking a really long time these days to
get a simple `mvn clean install` completed. We've produced tons of tests
that are adding to build times and I think that while we have lots of
tests, it does NOT mean:


1. we need to execute all of them all of the time or that
2. we have good coverage

I think that we will need to start optimizing our unit test runs (and the
build in general) to get us back to a point where we can get a build in
less than 15 minutes (10 would be better, but I'm not sure we'll get there
as an initial step). Just from the test perspective, I think this will mean:

1. More unit tests that mock interfaces/classes over full stack integration
tests
2. Move more unit tests to integration tests

So with all that in mind, I've got a local branch that adds a -Dcoverage
option that does two things:

1. builds an aggregated report of unit test data to show us what's running
long
2. builds an aggregated test coverage report that shows us what code is
tested and what is not

So far this only is setup for unit tests (not integration tests) and it's
really just a model as I don't have hadoop/spark/giraph in the mix yet for
coverage. As I alluded to earlier in this thread, I was going to use jacoco
now that it supports java 8 and does a better job with multi-module builds.
It works fine but I had to create a gremlin-coverage module to make it all
work. That kinda stinks, so I opted instead to repurpose gremlin-benchmark
to hold the coverage configuration. It just means that gremlin-benchmark
would now have some additional dependencies and the new config for the
"coverage" maven profile. If everyone thinks that's ok, I will make this
part of 3.2.2. I don't think that infringes too much on gremlin-benchmark,
but that's just my opinion - to me that's better than yet another module.

Given that there's been no objections in this thread, I will likely create
some JIRA tickets based on what i've suggested here so that I can get to
work on that first thing for 3.3.0 (and try to speed up the 3.2.2 build
where possible).


On Tue, Sep 20, 2016 at 11:16 AM, Stephen Mallette <spmalle...@gmail.com>
wrote:

> I think these are the ones that contain logic and are dynamically sorta
> driven:
>
> ElementFeatures.willAllowId(Object)
> VertexPropertyFeatures.willAllowId(Object)
> VertexFeatures.getCardinality(String)
>
> I was thinking that some graphs might return static values for these in
> which case caching would work. Obviously, schema driven graphs would have
> trouble with getCardinality(), though I don't remember the contexts in
> which any of these are used - my analysis didn't go that far.
>
>
> On Tue, Sep 20, 2016 at 10:54 AM, Jason Plurad <plur...@gmail.com> wrote:
>
>> Nice discussion thread, Stephen. I've tinkered around minimally with
>> writing a graph implementation, so hopefully we'll get more feedback from
>> others. From what I have done, +1 on killing @OptOut test annotations.
>> They
>> seem out of place on the Graph impl class.
>>
>> You mentioned "there is at least one method that could be called on
>> Features that is
>> typically dynamically driven based on schema" -- which method is that?
>>
>> -- Jason
>>
>> On Mon, Sep 19, 2016 at 4:33 PM, Stephen Mallette <spmalle...@gmail.com>
>> wrote:
>>
>> > I've spent the middle portion of the day reviewing our test
>> infrastructure
>> > and related open tickets and have some ideas to make some things
>> better. I
>> > titled this post for 3.3.0, but, in truth, I'm not sure what must be
>> 3.3.0
>> > and what might yet be useful and good for 3.2.x. I'm also using this
>> email
>> > as a way to organize my notes/ideas from the day, so apologies if I'm
>> > dumping a lot of stuff here to follow.
>> >
>> > (1) Of all the things I came up with, I think the biggest breaker is
>> this
>> > one: have one uber test suite in gremlin-test. In other words, merge
>> > gremlin-groovy-test to gremlin-test and get rid of that all together.
>> Then.
>> > stop publishing test artifacts out of hadoop-gremlin (and wherever else
>> we
>> > might be doing that). We can make groovy and hadoop dependencies
>> optional
>> > so that if providers aren't using them, they don't have to have them
>> sucked
>> > in and can just depend on them as needed.
>> >
>> > (2) Next biggest breaker - how does everyone feel about killing OptOut
>> and
>> > OptIn and getting those concepts out of gremlin-core and into features
>> of
>> > gremlin-test. I've heard at least two Graph providers mention a problem
>> > where they want to "OptOut" more at the GraphProvider level as opposed
>> to
>> > the Graph level as their configurations in the GraphProvider do more to
>> > drive that setting than the Graph does. I don't think we lose anything
>> by
>> > moving "OptOut" except for the describeGraph() functionality:
>> >
>> > http://tinkerpop.apache.org/docs/current/reference/#describe-graph
>> >
>> > which I'm not sure is that big a deal to worry about. That was a bit of
>> a
>> > nice idea that didn't really develop any further than where it is right
>> > now.
>> >
>> > (3) We currently tied the GraphProvider to a specific configuration of a
>> > Graph instance. So every time you want a slight permutation on that
>> > configuration, you need to create a new GraphProvider instance. I think
>> > that we can simplify that and cut down on the proliferation of those
>> > instances and in the same moment offer some added flexibility. I was
>> > digging through JUnit docs/code and I think there is a way for us to
>> create
>> > a "GraphProviderSource" which would annotate a test (rather than
>> > @GraphProviderClass). The GraphProviderSource would produce a list of
>> > GraphProvider instances to run each test in a suite with. So, if the
>> > GraphProviderSource produced 4 different GraphProvider instances, it
>> would
>> > execute each test in the suite 4 times (one for each GraphProvider).
>> >
>> > (4) I think this approach is nice because it spreads into something else
>> > that I think is important to us: getting maximum value for time out of
>> our
>> > tests. As we add GLVs and more tests (I think that without integration
>> > tests right now, we're over 12000 tests), the time it takes to do a
>> basic
>> > mvn clean install is getting longer and longer. We want that that as
>> short
>> > as possible while maximizing code coverage. To that end, I'll make
>> several
>> > points:
>> >
>> > + jacoco is now good with java 8 (i think it has been for a while, but i
>> > hadn't noticed). i worked with it a bit today and we should be able to
>> get
>> > a good aggregate test coverage report with it (assuming we are ok with
>> > adding a new "gremlin-coverage" module to maven - stinks, but perhaps
>> isn't
>> > so different than having added gremlin-benchmarks in some respects). If
>> we
>> > have that we can find out what combinations of GraphProviders give us
>> the
>> > best coverage for time and make that our standard testing profile.
>> > + We can build some fairly exotic GraphProviderSource implementations
>> that
>> > can help us test all possible configuration options for TinkerGraph or
>> > cover ranges of settings in Neo4jGraph or randomize the returned
>> > GraphProviders - these could all be options we execute in docker during
>> > code freeze week (and perhaps periodically during our dev cycles) to
>> ensure
>> > we're not breaking anything as a result of running the "maximized"
>> > configuration of just mvn clean install.
>> > + If that works, we can eliminate the use or Random in our test suite
>> for a
>> > standard mvn clean install thus eliminating the chance of some
>> > non-deterministic behavior. Rather than be "random" we just test all the
>> > cases.
>> > + Perhaps we could have different maven profiles that ran different
>> > GraphProviderSource implementations. I'm thinking that those might be
>> > triggered from different docker runs to help parallelize the tests and
>> > allow us to test more permutations more quickly???
>> >
>> > (5) Finally, I think we could speed up our test suite if we could figure
>> > out a way to cache Graph.Features in the test suite. A lot of tests get
>> > "ignored" because of test requirements, but the test suite requires a
>> Graph
>> > instance to check those requirements against the Features. For some
>> > providers, creating the Graph instances introduces disk I/O even when
>> the
>> > test will be ignored because of the feature. That setup/teardown is
>> > expensive and ends up slowing the tests. If we could cache those somehow
>> > and thus avoid the Graph instance creation, we'd save some processing -
>> I
>> > suspect it would be helpful to us internally with Neo4j. The trick of
>> > course is that the Features implementation can't be dynamically driven
>> and
>> > there is at least one method that could be called on Features that is
>> > typically dynamically driven based on schema. Very few tests use that
>> > however, so perhaps there is some way to workaround that problem.
>> >
>> > Well, my brain has been dumped. Thoughts welcome.
>> >
>>
>
>

Re: [DISCUSS] Test Suite Changes for 3.3.0

Reply via email to