Re: [DISCUSS] Test Suite Changes for 3.3.0

Ted Wilmes Sat, 08 Oct 2016 07:48:28 -0700

Adding coverage to gremlin-benchmark sounds good to me.  If we come up with
any other dev specific tooling that we'd like to add, maybe it would make
sense
to just rename gremlin-benchmark to something like gremlin-tools or
gremlin-dev.


--Ted

On Fri, Oct 7, 2016 at 5:20 PM, Stephen Mallette <spmalle...@gmail.com>
wrote:

> Thought I'd keep this thread warm a bit. If you've built the TinkerPop repo
> recently, you would realize it's taking a really long time these days to
> get a simple `mvn clean install` completed. We've produced tons of tests
> that are adding to build times and I think that while we have lots of
> tests, it does NOT mean:
>
> 1. we need to execute all of them all of the time or that
> 2. we have good coverage
>
> I think that we will need to start optimizing our unit test runs (and the
> build in general) to get us back to a point where we can get a build in
> less than 15 minutes (10 would be better, but I'm not sure we'll get there
> as an initial step). Just from the test perspective, I think this will
> mean:
>
> 1. More unit tests that mock interfaces/classes over full stack integration
> tests
> 2. Move more unit tests to integration tests
>
> So with all that in mind, I've got a local branch that adds a -Dcoverage
> option that does two things:
>
> 1. builds an aggregated report of unit test data to show us what's running
> long
> 2. builds an aggregated test coverage report that shows us what code is
> tested and what is not
>
> So far this only is setup for unit tests (not integration tests) and it's
> really just a model as I don't have hadoop/spark/giraph in the mix yet for
> coverage. As I alluded to earlier in this thread, I was going to use jacoco
> now that it supports java 8 and does a better job with multi-module builds.
> It works fine but I had to create a gremlin-coverage module to make it all
> work. That kinda stinks, so I opted instead to repurpose gremlin-benchmark
> to hold the coverage configuration. It just means that gremlin-benchmark
> would now have some additional dependencies and the new config for the
> "coverage" maven profile. If everyone thinks that's ok, I will make this
> part of 3.2.2. I don't think that infringes too much on gremlin-benchmark,
> but that's just my opinion - to me that's better than yet another module.
>
> Given that there's been no objections in this thread, I will likely create
> some JIRA tickets based on what i've suggested here so that I can get to
> work on that first thing for 3.3.0 (and try to speed up the 3.2.2 build
> where possible).
>
>
> On Tue, Sep 20, 2016 at 11:16 AM, Stephen Mallette <spmalle...@gmail.com>
> wrote:
>
> > I think these are the ones that contain logic and are dynamically sorta
> > driven:
> >
> > ElementFeatures.willAllowId(Object)
> > VertexPropertyFeatures.willAllowId(Object)
> > VertexFeatures.getCardinality(String)
> >
> > I was thinking that some graphs might return static values for these in
> > which case caching would work. Obviously, schema driven graphs would have
> > trouble with getCardinality(), though I don't remember the contexts in
> > which any of these are used - my analysis didn't go that far.
> >
> >
> > On Tue, Sep 20, 2016 at 10:54 AM, Jason Plurad <plur...@gmail.com>
> wrote:
> >
> >> Nice discussion thread, Stephen. I've tinkered around minimally with
> >> writing a graph implementation, so hopefully we'll get more feedback
> from
> >> others. From what I have done, +1 on killing @OptOut test annotations.
> >> They
> >> seem out of place on the Graph impl class.
> >>
> >> You mentioned "there is at least one method that could be called on
> >> Features that is
> >> typically dynamically driven based on schema" -- which method is that?
> >>
> >> -- Jason
> >>
> >> On Mon, Sep 19, 2016 at 4:33 PM, Stephen Mallette <spmalle...@gmail.com
> >
> >> wrote:
> >>
> >> > I've spent the middle portion of the day reviewing our test
> >> infrastructure
> >> > and related open tickets and have some ideas to make some things
> >> better. I
> >> > titled this post for 3.3.0, but, in truth, I'm not sure what must be
> >> 3.3.0
> >> > and what might yet be useful and good for 3.2.x. I'm also using this
> >> email
> >> > as a way to organize my notes/ideas from the day, so apologies if I'm
> >> > dumping a lot of stuff here to follow.
> >> >
> >> > (1) Of all the things I came up with, I think the biggest breaker is
> >> this
> >> > one: have one uber test suite in gremlin-test. In other words, merge
> >> > gremlin-groovy-test to gremlin-test and get rid of that all together.
> >> Then.
> >> > stop publishing test artifacts out of hadoop-gremlin (and wherever
> else
> >> we
> >> > might be doing that). We can make groovy and hadoop dependencies
> >> optional
> >> > so that if providers aren't using them, they don't have to have them
> >> sucked
> >> > in and can just depend on them as needed.
> >> >
> >> > (2) Next biggest breaker - how does everyone feel about killing OptOut
> >> and
> >> > OptIn and getting those concepts out of gremlin-core and into features
> >> of
> >> > gremlin-test. I've heard at least two Graph providers mention a
> problem
> >> > where they want to "OptOut" more at the GraphProvider level as opposed
> >> to
> >> > the Graph level as their configurations in the GraphProvider do more
> to
> >> > drive that setting than the Graph does. I don't think we lose anything
> >> by
> >> > moving "OptOut" except for the describeGraph() functionality:
> >> >
> >> > http://tinkerpop.apache.org/docs/current/reference/#describe-graph
> >> >
> >> > which I'm not sure is that big a deal to worry about. That was a bit
> of
> >> a
> >> > nice idea that didn't really develop any further than where it is
> right
> >> > now.
> >> >
> >> > (3) We currently tied the GraphProvider to a specific configuration
> of a
> >> > Graph instance. So every time you want a slight permutation on that
> >> > configuration, you need to create a new GraphProvider instance. I
> think
> >> > that we can simplify that and cut down on the proliferation of those
> >> > instances and in the same moment offer some added flexibility. I was
> >> > digging through JUnit docs/code and I think there is a way for us to
> >> create
> >> > a "GraphProviderSource" which would annotate a test (rather than
> >> > @GraphProviderClass). The GraphProviderSource would produce a list of
> >> > GraphProvider instances to run each test in a suite with. So, if the
> >> > GraphProviderSource produced 4 different GraphProvider instances, it
> >> would
> >> > execute each test in the suite 4 times (one for each GraphProvider).
> >> >
> >> > (4) I think this approach is nice because it spreads into something
> else
> >> > that I think is important to us: getting maximum value for time out of
> >> our
> >> > tests. As we add GLVs and more tests (I think that without integration
> >> > tests right now, we're over 12000 tests), the time it takes to do a
> >> basic
> >> > mvn clean install is getting longer and longer. We want that that as
> >> short
> >> > as possible while maximizing code coverage. To that end, I'll make
> >> several
> >> > points:
> >> >
> >> > + jacoco is now good with java 8 (i think it has been for a while,
> but i
> >> > hadn't noticed). i worked with it a bit today and we should be able to
> >> get
> >> > a good aggregate test coverage report with it (assuming we are ok with
> >> > adding a new "gremlin-coverage" module to maven - stinks, but perhaps
> >> isn't
> >> > so different than having added gremlin-benchmarks in some respects).
> If
> >> we
> >> > have that we can find out what combinations of GraphProviders give us
> >> the
> >> > best coverage for time and make that our standard testing profile.
> >> > + We can build some fairly exotic GraphProviderSource implementations
> >> that
> >> > can help us test all possible configuration options for TinkerGraph or
> >> > cover ranges of settings in Neo4jGraph or randomize the returned
> >> > GraphProviders - these could all be options we execute in docker
> during
> >> > code freeze week (and perhaps periodically during our dev cycles) to
> >> ensure
> >> > we're not breaking anything as a result of running the "maximized"
> >> > configuration of just mvn clean install.
> >> > + If that works, we can eliminate the use or Random in our test suite
> >> for a
> >> > standard mvn clean install thus eliminating the chance of some
> >> > non-deterministic behavior. Rather than be "random" we just test all
> the
> >> > cases.
> >> > + Perhaps we could have different maven profiles that ran different
> >> > GraphProviderSource implementations. I'm thinking that those might be
> >> > triggered from different docker runs to help parallelize the tests and
> >> > allow us to test more permutations more quickly???
> >> >
> >> > (5) Finally, I think we could speed up our test suite if we could
> figure
> >> > out a way to cache Graph.Features in the test suite. A lot of tests
> get
> >> > "ignored" because of test requirements, but the test suite requires a
> >> Graph
> >> > instance to check those requirements against the Features. For some
> >> > providers, creating the Graph instances introduces disk I/O even when
> >> the
> >> > test will be ignored because of the feature. That setup/teardown is
> >> > expensive and ends up slowing the tests. If we could cache those
> somehow
> >> > and thus avoid the Graph instance creation, we'd save some processing
> -
> >> I
> >> > suspect it would be helpful to us internally with Neo4j. The trick of
> >> > course is that the Features implementation can't be dynamically driven
> >> and
> >> > there is at least one method that could be called on Features that is
> >> > typically dynamically driven based on schema. Very few tests use that
> >> > however, so perhaps there is some way to workaround that problem.
> >> >
> >> > Well, my brain has been dumped. Thoughts welcome.
> >> >
> >>
> >
> >
>

Re: [DISCUSS] Test Suite Changes for 3.3.0

Reply via email to