Re: [DISCUSS] Test Suite Changes for 3.3.0

Stephen Mallette Sat, 08 Oct 2016 08:05:44 -0700

yeah - i actually thought about "gremlin-tools" and almost wondered if we
shouldn't do:


gremlin-tools
+-gremlin-benchmarks
+-gremlin-coverage

for 3.3.0. The added flexibility to have independent poms for these things
might turn out useful. I sorta like that idea.

On Sat, Oct 8, 2016 at 10:47 AM, Ted Wilmes <twil...@gmail.com> wrote:

> Adding coverage to gremlin-benchmark sounds good to me.  If we come up with
> any other dev specific tooling that we'd like to add, maybe it would make
> sense
> to just rename gremlin-benchmark to something like gremlin-tools or
> gremlin-dev.
>
> --Ted
>
> On Fri, Oct 7, 2016 at 5:20 PM, Stephen Mallette <spmalle...@gmail.com>
> wrote:
>
> > Thought I'd keep this thread warm a bit. If you've built the TinkerPop
> repo
> > recently, you would realize it's taking a really long time these days to
> > get a simple `mvn clean install` completed. We've produced tons of tests
> > that are adding to build times and I think that while we have lots of
> > tests, it does NOT mean:
> >
> > 1. we need to execute all of them all of the time or that
> > 2. we have good coverage
> >
> > I think that we will need to start optimizing our unit test runs (and the
> > build in general) to get us back to a point where we can get a build in
> > less than 15 minutes (10 would be better, but I'm not sure we'll get
> there
> > as an initial step). Just from the test perspective, I think this will
> > mean:
> >
> > 1. More unit tests that mock interfaces/classes over full stack
> integration
> > tests
> > 2. Move more unit tests to integration tests
> >
> > So with all that in mind, I've got a local branch that adds a -Dcoverage
> > option that does two things:
> >
> > 1. builds an aggregated report of unit test data to show us what's
> running
> > long
> > 2. builds an aggregated test coverage report that shows us what code is
> > tested and what is not
> >
> > So far this only is setup for unit tests (not integration tests) and it's
> > really just a model as I don't have hadoop/spark/giraph in the mix yet
> for
> > coverage. As I alluded to earlier in this thread, I was going to use
> jacoco
> > now that it supports java 8 and does a better job with multi-module
> builds.
> > It works fine but I had to create a gremlin-coverage module to make it
> all
> > work. That kinda stinks, so I opted instead to repurpose
> gremlin-benchmark
> > to hold the coverage configuration. It just means that gremlin-benchmark
> > would now have some additional dependencies and the new config for the
> > "coverage" maven profile. If everyone thinks that's ok, I will make this
> > part of 3.2.2. I don't think that infringes too much on
> gremlin-benchmark,
> > but that's just my opinion - to me that's better than yet another module.
> >
> > Given that there's been no objections in this thread, I will likely
> create
> > some JIRA tickets based on what i've suggested here so that I can get to
> > work on that first thing for 3.3.0 (and try to speed up the 3.2.2 build
> > where possible).
> >
> >
> > On Tue, Sep 20, 2016 at 11:16 AM, Stephen Mallette <spmalle...@gmail.com
> >
> > wrote:
> >
> > > I think these are the ones that contain logic and are dynamically sorta
> > > driven:
> > >
> > > ElementFeatures.willAllowId(Object)
> > > VertexPropertyFeatures.willAllowId(Object)
> > > VertexFeatures.getCardinality(String)
> > >
> > > I was thinking that some graphs might return static values for these in
> > > which case caching would work. Obviously, schema driven graphs would
> have
> > > trouble with getCardinality(), though I don't remember the contexts in
> > > which any of these are used - my analysis didn't go that far.
> > >
> > >
> > > On Tue, Sep 20, 2016 at 10:54 AM, Jason Plurad <plur...@gmail.com>
> > wrote:
> > >
> > >> Nice discussion thread, Stephen. I've tinkered around minimally with
> > >> writing a graph implementation, so hopefully we'll get more feedback
> > from
> > >> others. From what I have done, +1 on killing @OptOut test annotations.
> > >> They
> > >> seem out of place on the Graph impl class.
> > >>
> > >> You mentioned "there is at least one method that could be called on
> > >> Features that is
> > >> typically dynamically driven based on schema" -- which method is that?
> > >>
> > >> -- Jason
> > >>
> > >> On Mon, Sep 19, 2016 at 4:33 PM, Stephen Mallette <
> spmalle...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > I've spent the middle portion of the day reviewing our test
> > >> infrastructure
> > >> > and related open tickets and have some ideas to make some things
> > >> better. I
> > >> > titled this post for 3.3.0, but, in truth, I'm not sure what must be
> > >> 3.3.0
> > >> > and what might yet be useful and good for 3.2.x. I'm also using this
> > >> email
> > >> > as a way to organize my notes/ideas from the day, so apologies if
> I'm
> > >> > dumping a lot of stuff here to follow.
> > >> >
> > >> > (1) Of all the things I came up with, I think the biggest breaker is
> > >> this
> > >> > one: have one uber test suite in gremlin-test. In other words, merge
> > >> > gremlin-groovy-test to gremlin-test and get rid of that all
> together.
> > >> Then.
> > >> > stop publishing test artifacts out of hadoop-gremlin (and wherever
> > else
> > >> we
> > >> > might be doing that). We can make groovy and hadoop dependencies
> > >> optional
> > >> > so that if providers aren't using them, they don't have to have them
> > >> sucked
> > >> > in and can just depend on them as needed.
> > >> >
> > >> > (2) Next biggest breaker - how does everyone feel about killing
> OptOut
> > >> and
> > >> > OptIn and getting those concepts out of gremlin-core and into
> features
> > >> of
> > >> > gremlin-test. I've heard at least two Graph providers mention a
> > problem
> > >> > where they want to "OptOut" more at the GraphProvider level as
> opposed
> > >> to
> > >> > the Graph level as their configurations in the GraphProvider do more
> > to
> > >> > drive that setting than the Graph does. I don't think we lose
> anything
> > >> by
> > >> > moving "OptOut" except for the describeGraph() functionality:
> > >> >
> > >> > http://tinkerpop.apache.org/docs/current/reference/#describe-graph
> > >> >
> > >> > which I'm not sure is that big a deal to worry about. That was a bit
> > of
> > >> a
> > >> > nice idea that didn't really develop any further than where it is
> > right
> > >> > now.
> > >> >
> > >> > (3) We currently tied the GraphProvider to a specific configuration
> > of a
> > >> > Graph instance. So every time you want a slight permutation on that
> > >> > configuration, you need to create a new GraphProvider instance. I
> > think
> > >> > that we can simplify that and cut down on the proliferation of those
> > >> > instances and in the same moment offer some added flexibility. I was
> > >> > digging through JUnit docs/code and I think there is a way for us to
> > >> create
> > >> > a "GraphProviderSource" which would annotate a test (rather than
> > >> > @GraphProviderClass). The GraphProviderSource would produce a list
> of
> > >> > GraphProvider instances to run each test in a suite with. So, if the
> > >> > GraphProviderSource produced 4 different GraphProvider instances, it
> > >> would
> > >> > execute each test in the suite 4 times (one for each GraphProvider).
> > >> >
> > >> > (4) I think this approach is nice because it spreads into something
> > else
> > >> > that I think is important to us: getting maximum value for time out
> of
> > >> our
> > >> > tests. As we add GLVs and more tests (I think that without
> integration
> > >> > tests right now, we're over 12000 tests), the time it takes to do a
> > >> basic
> > >> > mvn clean install is getting longer and longer. We want that that as
> > >> short
> > >> > as possible while maximizing code coverage. To that end, I'll make
> > >> several
> > >> > points:
> > >> >
> > >> > + jacoco is now good with java 8 (i think it has been for a while,
> > but i
> > >> > hadn't noticed). i worked with it a bit today and we should be able
> to
> > >> get
> > >> > a good aggregate test coverage report with it (assuming we are ok
> with
> > >> > adding a new "gremlin-coverage" module to maven - stinks, but
> perhaps
> > >> isn't
> > >> > so different than having added gremlin-benchmarks in some respects).
> > If
> > >> we
> > >> > have that we can find out what combinations of GraphProviders give
> us
> > >> the
> > >> > best coverage for time and make that our standard testing profile.
> > >> > + We can build some fairly exotic GraphProviderSource
> implementations
> > >> that
> > >> > can help us test all possible configuration options for TinkerGraph
> or
> > >> > cover ranges of settings in Neo4jGraph or randomize the returned
> > >> > GraphProviders - these could all be options we execute in docker
> > during
> > >> > code freeze week (and perhaps periodically during our dev cycles) to
> > >> ensure
> > >> > we're not breaking anything as a result of running the "maximized"
> > >> > configuration of just mvn clean install.
> > >> > + If that works, we can eliminate the use or Random in our test
> suite
> > >> for a
> > >> > standard mvn clean install thus eliminating the chance of some
> > >> > non-deterministic behavior. Rather than be "random" we just test all
> > the
> > >> > cases.
> > >> > + Perhaps we could have different maven profiles that ran different
> > >> > GraphProviderSource implementations. I'm thinking that those might
> be
> > >> > triggered from different docker runs to help parallelize the tests
> and
> > >> > allow us to test more permutations more quickly???
> > >> >
> > >> > (5) Finally, I think we could speed up our test suite if we could
> > figure
> > >> > out a way to cache Graph.Features in the test suite. A lot of tests
> > get
> > >> > "ignored" because of test requirements, but the test suite requires
> a
> > >> Graph
> > >> > instance to check those requirements against the Features. For some
> > >> > providers, creating the Graph instances introduces disk I/O even
> when
> > >> the
> > >> > test will be ignored because of the feature. That setup/teardown is
> > >> > expensive and ends up slowing the tests. If we could cache those
> > somehow
> > >> > and thus avoid the Graph instance creation, we'd save some
> processing
> > -
> > >> I
> > >> > suspect it would be helpful to us internally with Neo4j. The trick
> of
> > >> > course is that the Features implementation can't be dynamically
> driven
> > >> and
> > >> > there is at least one method that could be called on Features that
> is
> > >> > typically dynamically driven based on schema. Very few tests use
> that
> > >> > however, so perhaps there is some way to workaround that problem.
> > >> >
> > >> > Well, my brain has been dumped. Thoughts welcome.
> > >> >
> > >>
> > >
> > >
> >
>

Re: [DISCUSS] Test Suite Changes for 3.3.0

Reply via email to