Adding coverage to gremlin-benchmark sounds good to me. If we come up with any other dev specific tooling that we'd like to add, maybe it would make sense to just rename gremlin-benchmark to something like gremlin-tools or gremlin-dev.
--Ted On Fri, Oct 7, 2016 at 5:20 PM, Stephen Mallette <spmalle...@gmail.com> wrote: > Thought I'd keep this thread warm a bit. If you've built the TinkerPop repo > recently, you would realize it's taking a really long time these days to > get a simple `mvn clean install` completed. We've produced tons of tests > that are adding to build times and I think that while we have lots of > tests, it does NOT mean: > > 1. we need to execute all of them all of the time or that > 2. we have good coverage > > I think that we will need to start optimizing our unit test runs (and the > build in general) to get us back to a point where we can get a build in > less than 15 minutes (10 would be better, but I'm not sure we'll get there > as an initial step). Just from the test perspective, I think this will > mean: > > 1. More unit tests that mock interfaces/classes over full stack integration > tests > 2. Move more unit tests to integration tests > > So with all that in mind, I've got a local branch that adds a -Dcoverage > option that does two things: > > 1. builds an aggregated report of unit test data to show us what's running > long > 2. builds an aggregated test coverage report that shows us what code is > tested and what is not > > So far this only is setup for unit tests (not integration tests) and it's > really just a model as I don't have hadoop/spark/giraph in the mix yet for > coverage. As I alluded to earlier in this thread, I was going to use jacoco > now that it supports java 8 and does a better job with multi-module builds. > It works fine but I had to create a gremlin-coverage module to make it all > work. That kinda stinks, so I opted instead to repurpose gremlin-benchmark > to hold the coverage configuration. It just means that gremlin-benchmark > would now have some additional dependencies and the new config for the > "coverage" maven profile. If everyone thinks that's ok, I will make this > part of 3.2.2. I don't think that infringes too much on gremlin-benchmark, > but that's just my opinion - to me that's better than yet another module. > > Given that there's been no objections in this thread, I will likely create > some JIRA tickets based on what i've suggested here so that I can get to > work on that first thing for 3.3.0 (and try to speed up the 3.2.2 build > where possible). > > > On Tue, Sep 20, 2016 at 11:16 AM, Stephen Mallette <spmalle...@gmail.com> > wrote: > > > I think these are the ones that contain logic and are dynamically sorta > > driven: > > > > ElementFeatures.willAllowId(Object) > > VertexPropertyFeatures.willAllowId(Object) > > VertexFeatures.getCardinality(String) > > > > I was thinking that some graphs might return static values for these in > > which case caching would work. Obviously, schema driven graphs would have > > trouble with getCardinality(), though I don't remember the contexts in > > which any of these are used - my analysis didn't go that far. > > > > > > On Tue, Sep 20, 2016 at 10:54 AM, Jason Plurad <plur...@gmail.com> > wrote: > > > >> Nice discussion thread, Stephen. I've tinkered around minimally with > >> writing a graph implementation, so hopefully we'll get more feedback > from > >> others. From what I have done, +1 on killing @OptOut test annotations. > >> They > >> seem out of place on the Graph impl class. > >> > >> You mentioned "there is at least one method that could be called on > >> Features that is > >> typically dynamically driven based on schema" -- which method is that? > >> > >> -- Jason > >> > >> On Mon, Sep 19, 2016 at 4:33 PM, Stephen Mallette <spmalle...@gmail.com > > > >> wrote: > >> > >> > I've spent the middle portion of the day reviewing our test > >> infrastructure > >> > and related open tickets and have some ideas to make some things > >> better. I > >> > titled this post for 3.3.0, but, in truth, I'm not sure what must be > >> 3.3.0 > >> > and what might yet be useful and good for 3.2.x. I'm also using this > >> email > >> > as a way to organize my notes/ideas from the day, so apologies if I'm > >> > dumping a lot of stuff here to follow. > >> > > >> > (1) Of all the things I came up with, I think the biggest breaker is > >> this > >> > one: have one uber test suite in gremlin-test. In other words, merge > >> > gremlin-groovy-test to gremlin-test and get rid of that all together. > >> Then. > >> > stop publishing test artifacts out of hadoop-gremlin (and wherever > else > >> we > >> > might be doing that). We can make groovy and hadoop dependencies > >> optional > >> > so that if providers aren't using them, they don't have to have them > >> sucked > >> > in and can just depend on them as needed. > >> > > >> > (2) Next biggest breaker - how does everyone feel about killing OptOut > >> and > >> > OptIn and getting those concepts out of gremlin-core and into features > >> of > >> > gremlin-test. I've heard at least two Graph providers mention a > problem > >> > where they want to "OptOut" more at the GraphProvider level as opposed > >> to > >> > the Graph level as their configurations in the GraphProvider do more > to > >> > drive that setting than the Graph does. I don't think we lose anything > >> by > >> > moving "OptOut" except for the describeGraph() functionality: > >> > > >> > http://tinkerpop.apache.org/docs/current/reference/#describe-graph > >> > > >> > which I'm not sure is that big a deal to worry about. That was a bit > of > >> a > >> > nice idea that didn't really develop any further than where it is > right > >> > now. > >> > > >> > (3) We currently tied the GraphProvider to a specific configuration > of a > >> > Graph instance. So every time you want a slight permutation on that > >> > configuration, you need to create a new GraphProvider instance. I > think > >> > that we can simplify that and cut down on the proliferation of those > >> > instances and in the same moment offer some added flexibility. I was > >> > digging through JUnit docs/code and I think there is a way for us to > >> create > >> > a "GraphProviderSource" which would annotate a test (rather than > >> > @GraphProviderClass). The GraphProviderSource would produce a list of > >> > GraphProvider instances to run each test in a suite with. So, if the > >> > GraphProviderSource produced 4 different GraphProvider instances, it > >> would > >> > execute each test in the suite 4 times (one for each GraphProvider). > >> > > >> > (4) I think this approach is nice because it spreads into something > else > >> > that I think is important to us: getting maximum value for time out of > >> our > >> > tests. As we add GLVs and more tests (I think that without integration > >> > tests right now, we're over 12000 tests), the time it takes to do a > >> basic > >> > mvn clean install is getting longer and longer. We want that that as > >> short > >> > as possible while maximizing code coverage. To that end, I'll make > >> several > >> > points: > >> > > >> > + jacoco is now good with java 8 (i think it has been for a while, > but i > >> > hadn't noticed). i worked with it a bit today and we should be able to > >> get > >> > a good aggregate test coverage report with it (assuming we are ok with > >> > adding a new "gremlin-coverage" module to maven - stinks, but perhaps > >> isn't > >> > so different than having added gremlin-benchmarks in some respects). > If > >> we > >> > have that we can find out what combinations of GraphProviders give us > >> the > >> > best coverage for time and make that our standard testing profile. > >> > + We can build some fairly exotic GraphProviderSource implementations > >> that > >> > can help us test all possible configuration options for TinkerGraph or > >> > cover ranges of settings in Neo4jGraph or randomize the returned > >> > GraphProviders - these could all be options we execute in docker > during > >> > code freeze week (and perhaps periodically during our dev cycles) to > >> ensure > >> > we're not breaking anything as a result of running the "maximized" > >> > configuration of just mvn clean install. > >> > + If that works, we can eliminate the use or Random in our test suite > >> for a > >> > standard mvn clean install thus eliminating the chance of some > >> > non-deterministic behavior. Rather than be "random" we just test all > the > >> > cases. > >> > + Perhaps we could have different maven profiles that ran different > >> > GraphProviderSource implementations. I'm thinking that those might be > >> > triggered from different docker runs to help parallelize the tests and > >> > allow us to test more permutations more quickly??? > >> > > >> > (5) Finally, I think we could speed up our test suite if we could > figure > >> > out a way to cache Graph.Features in the test suite. A lot of tests > get > >> > "ignored" because of test requirements, but the test suite requires a > >> Graph > >> > instance to check those requirements against the Features. For some > >> > providers, creating the Graph instances introduces disk I/O even when > >> the > >> > test will be ignored because of the feature. That setup/teardown is > >> > expensive and ends up slowing the tests. If we could cache those > somehow > >> > and thus avoid the Graph instance creation, we'd save some processing > - > >> I > >> > suspect it would be helpful to us internally with Neo4j. The trick of > >> > course is that the Features implementation can't be dynamically driven > >> and > >> > there is at least one method that could be called on Features that is > >> > typically dynamically driven based on schema. Very few tests use that > >> > however, so perhaps there is some way to workaround that problem. > >> > > >> > Well, my brain has been dumped. Thoughts welcome. > >> > > >> > > > > >