Re: [DISCUSS] Implementing code quality principles, and rules (was: Code quality, principles and rules)

Ariel Weisberg Tue, 28 Mar 2017 10:53:35 -0700

Hi,

Code coverage:

I value code coverage as part of the review and development process. As
a project wide target I think it's not as high value, but having a
standard encourages people to take the time to use the tools and that's
a healthy side effect. Code coverage is a measure of code execution not
whether that code returned the correct answer or generated the right
behavior. We should also document what kind of code coverage is valuable
and how to distinguish between coverage that tests vs merely exercises
functionality.

Tests always passing:

> 2. If a patch has a failing test (test touching the same code path), the code 
> or test should be fixed prior to being accepted.

The reality of tests is that they will degrade over time unless you have
very strict policies on committing that mean no one can commit unless
the tests are consistently passing. We have to do away with, well it's
failing on trunk, but my branch doesn't make it worse, or that test
failure looks unrelated.

I think it's all about picking the poor sod who is going to have to drop
what their are doing and commit to fixing a test.

If you give people any kind of out other than doing the work that is
unrelated to their main objective it's going to happen an unfortunate
amount of the time. That's not a negative statement about our particular
group it's just the economics of the process. What we have so far is a
good high level statement about the desired outcome, but what we don't
have is an incentive structure to make that outcome a reality.

I'm not particularly attached to any specific incentive structure as
long as it's convincing that some people will not be able to avoid
working on technical debt associated with tests and it spreads the load
out in some reasonable way.

I think we should have automated indication of whether all branches are
open to commit. Just an indicator. It doesn't have to (shouldn't?)
physically enforce it. That encourages us to express the policy in a
clear unambiguous way, and it encourages us to build tests and CI to
that policy.

It's not perfect. It's not going to distribute the work fairly, but it
will at least get it done. It's also going to be super annoying and
blocking as you have to artificially delay committing while you wait on
someone to fix a test. You wish you could just commit once fixing the
test has been assigned, but then some of the time the test will remain
broken indefinitely and you are back where you started. A certain amount
of pain spread wide is what makes things happen sadly.

This is one incentive structure that has people stop what they are doing
and what they care about to work on the thing that they don't really
care about at the moment. There are other strategies for picking a
victim and making it someone's problem so they get fixed, but as an OSS
project it's really going to come down to who wants to commit the most
that will fix the tests. Telling people to do stuff doesn't work so hot.

Alternatively we can do what we have always done which is somewhat
ignore the tests failing (shout out to those working diligently to fix
them in the background), and then just block releases on the tests. I
don't like this approach because it makes it hard to hit release dates
consistently and the longer a test remains broken the harder it is to
suss out who broke it and knows how to fix it. It also introduces
overhead for ongoing development intra-release as every developer has to
parse the current set of failing tests. Overall I see it as inefficient.

 > A recurring failing test carries no signal and is better deleted.

Writing tests isn't free. They are just as much part of the end product
as feature. Just because a test doesn't pass doesn't mean it should be
deleted unless it's a particularly bad test or doesn't succeed at
testing what it is supposed to test. Fixing should be cheaper than
writing from scratch. If we delete tests we should have a policy for
justifying why rather than giving people carte blanche (or close enough)
to avoid failing tests by deleting them.

If the cheapest path available is to delete tests it's going to happen
more than it should. It should be expensive enough to delete a test that
it's balanced with the value of fixing it. I just want to make sure
reviewers have enough ammo to push back when they should.

Regards,
Ariel

On Tue, Mar 28, 2017, at 11:53 AM, sankalp kohli wrote:
> If the code coverage goes down or do not go above the required cutoff due
> to adding toString or getter setter like functionality, can we make it a
> process to explain the reason in the JIRA before committing it?
> 
> Regarding changes in hard to unit tests parts of the code, can we make it
> a
> process to link the JIRA which talks about rewriting that component of
> C*.
> Linking JIRAs will help us learn how many changes have gone in with
> little
> or no testing due to this component.
> 
> On Tue, Mar 28, 2017 at 8:38 AM, Jason Brown <[email protected]>
> wrote:
> 
> > re: code coverage (this could become it's own thread, and probably should)
> >
> > As somebody working on a new large feature (CASSANDRA-8457), I have a few
> > thoughts/observations about code coverage. As a preface, though, I'll point
> > out that we have jacoco (http://www.jacoco.org/jacoco/trunk/index.html)
> > already incorporated into the build, so it's quite easy to generate code
> > coverage reports ('ant codecoverage' runs the tests and creates the jacoco
> > binary, and 'ant jacoco-report' creates an html dump).
> >
> > As CASSANDRA-8457 is largely additive, it's pretty easy for me to confirm
> > the code coverage, as it's a whole new package - so clearly it's pretty
> > obvious to what i didn't cover in the test suite. Contrast this against the
> > rest of the code base. It may be harder to discern, from a simple
> > percentage output, just how much of an impact the added tests have across
> > the entire code base. Further, if a given patch added minor functionality
> > (say a one-liner fix), plus implementing some toString() functions, you
> > might actually see the coverage go *down* if the patch doesn't include
> > tests for the toString().
> >
> > Also, for things that just can't be unit tested in the current code base,
> > but where correctness can be ensured with a dtest, that wouldn't get
> > included by coverage tools. Of course, this is where incremental changes to
> > the code base to allow unit testing are really awesome and appreciated.
> > I've been encouraging some folks to do that with their submissions. It
> > makes their patches a bit more involved (for them to code and for me to
> > review), but hopefully it improves the overall quality of the code.
> >
> > Thus, imho, for new/large features, sure, having a high coverage is an
> > awesome thing: I'd think a minimum of 80%, with a "highly recommended"
> > target of > 90%. For smaller fixes, I'm not so sure, but I think a
> > reasonable agreement between the contributor and reviewer, based on the
> > guidelines coming from Blake's forthcoming doc, seem like a good point of
> > departure.
> >
> > On Tue, Mar 28, 2017 at 5:57 AM, Josh McKenzie <[email protected]>
> > wrote:
> >
> > > Hold on now - there's no reason to throw the baby out with the bathwater
> > on
> > > this. Code coverage on it's own is admittedly a questionable metric, but
> > > bad metrics are better than *no* metrics if taken with context.
> > >
> > > Most code coverage tools will allow you to drill down to the granularity
> > > we'd need to confirm that new code (new, as in entirely new file, etc)
> > was
> > > covered, and to delta against a previously run coverage metric to make
> > sure
> > > there's an uptick in modules where we make changes. It's quite a bit more
> > > involved / extra overhead compared to what we do now; I'd argue we
> > probably
> > > need to learn to crawl before we can walk (as in, actually write tests
> > for
> > > the code that goes into the project and do a *lot* more jmh benchmarking
> > > for perf changes).
> > >
> > > Definitely like the direction this is headed, but it sounds like some
> > > cart-before-horse to me if we integrate point 4 with the rest of them.
> > Also
> > > - for point 2 who's going to do the work to correlate flaky failures to
> > the
> > > changes that introduced them, or are we just going to accept that flaky
> > > tests get added and deal with them retroactively? I assume we could also
> > > set up some kind of CI job that would run a new test, say, 20 times and
> > > greenlight it for patches adding completely new tests? Might be a bad
> > idea,
> > > might not.
> > >
> > > On Tue, Mar 28, 2017 at 1:13 AM, Blake Eggleston <[email protected]>
> > > wrote:
> > >
> > > > In addition to it’s test coverage problem, the project has a general
> > > > testability problem, and I think it would be more effective to
> > introduce
> > > > some testing guidelines and standards that drive incremental
> > improvement
> > > of
> > > > both, instead of requiring an arbitrary code coverage metric be hit,
> > > which
> > > > doesn’t tell the whole story anyway.
> > > >
> > > > It’s not ready yet, but I’ve been putting together a testing standards
> > > > document for the project since bringing it up in the “Code quality,
> > > > principles and rules” email thread a week or so ago.
> > > >
> > > > On March 27, 2017 at 4:51:31 PM, Edward Capriolo (
> > [email protected])
> > > > wrote:
> > > > On Mon, Mar 27, 2017 at 7:03 PM, Josh McKenzie <[email protected]>
> > > > wrote:
> > > >
> > > > > How do we plan on verifying #4? Also, root-cause to tie back new code
> > > > that
> > > > > introduces flaky tests (i.e. passes on commit, fails 5% of the time
> > > > > thereafter) is a non-trivial pursuit (thinking #2 here), and a pretty
> > > > > common problem in this environment.
> > > > >
> > > > > On Mon, Mar 27, 2017 at 6:51 PM, Nate McCall <[email protected]>
> > > wrote:
> > > > >
> > > > > > I don't want to lose track of the original idea from François, so
> > > > > > let's do this formally in preparation for a vote. Having this all
> > in
> > > > > > place will make transition to new testing infrastructure more
> > > > > > goal-oriented and keep us more focused moving forward.
> > > > > >
> > > > > > Does anybody have specific feedback/discussion points on the
> > > following
> > > > > > (awesome, IMO) proposal:
> > > > > >
> > > > > > Principles:
> > > > > >
> > > > > > 1. Tests always pass. This is the starting point. If we don't care
> > > > > > about test failures, then we should stop writing tests. A recurring
> > > > > > failing test carries no signal and is better deleted.
> > > > > > 2. The code is tested.
> > > > > >
> > > > > > Assuming we can align on these principles, here is a proposal for
> > > > > > their implementation.
> > > > > >
> > > > > > Rules:
> > > > > >
> > > > > > 1. Each new release passes all tests (no flakinesss).
> > > > > > 2. If a patch has a failing test (test touching the same code
> > path),
> > > > > > the code or test should be fixed prior to being accepted.
> > > > > > 3. Bugs fixes should have one test that fails prior to the fix and
> > > > > > passes after fix.
> > > > > > 4. New code should have at least 90% test coverage.
> > > > > >
> > > > >
> > > >
> > > > True #4 is hard to verify in he current state. This was mentioned in a
> > > > separate thread: If the code was in submodules, the code coverage tools
> > > > should have less work to do because they typically only count coverage
> > > for
> > > > a module and the tests inside that module. At that point it should be
> > > easy
> > > > to write a plugin on top of something like this:
> > > > http://alvinalexander.com/blog/post/java/sample-
> > > cobertura-ant-build-script
> > > > .
> > > >
> > > > This is also an option:
> > > >
> > > > https://about.sonarqube.com/news/2016/05/02/continuous-
> > > > analysis-for-oss-projects.html
> > > >
> > >
> >

Re: [DISCUSS] Implementing code quality principles, and rules (was: Code quality, principles and rules)

Reply via email to