Re: Measuring Release Quality

2018-09-22 Thread kurt greaves
; Stress
> >> Testing
> >> Tools
> >>
> >> In some cases there's duplication (Metrics + Observability, Coordination
> >> (=“Storage Proxy, Hints, Batchlog, Counters…") + Hints, Local Write-Read
> >> Paths + Core)
> >> In others, there’s a lack of granularity (Streaming + Messaging, Core,
> >> Coordination, Distributed Metadata)
> >> In others, there’s a lack of clarity (Core, Lifecycle, Coordination)
> >> Others are probably missing entirely (Transient Replication, …?)
> >>
> >> Labels are also used fairly haphazardly, and there’s no clear definition
> >> of “priority”
> >>
> >> Perhaps we should form a working group to suggest a methodology for
> >> filling out JIRA, standardise the necessary components, labels etc, and
> put
> >> together a wiki page with step-by-step instructions on how to do it?
> >>
> >>
> >>> On 20 Sep 2018, at 15:29, Joshua McKenzie 
> wrote:
> >>>
> >>> I've spent a good bit of time thinking about the above and bounced off
> >> both
> >>> different ways to measure quality and progress as well as trying to
> >>> influence community behavior on this topic. My advice: start small and
> >>> simple (KISS, YAGNI, all that). Get metrics for pass/fail on
> >>> utest/dtest/flakiness over time, perhaps also aggregate bug count by
> >>> component over time. After spending a predetermined time doing that (a
> >>> couple months?) as an experiment, we retrospect as a project and see if
> >>> these efforts are adding value commensurate with the time investment
> >>> required to perform the measurement and analysis.
> >>>
> >>> There's a lot of really good ideas in that linked wiki article / this
> >> email
> >>> thread. The biggest challenge, and risk of failure, is in translating
> >> good
> >>> ideas into action and selling project participants on the value of
> >> changing
> >>> their behavior. The latter is where we've fallen short over the years;
> >>> building consensus (especially regarding process /shudder) is Very
> Hard.
> >>>
> >>> Also - thanks for spearheading this discussion Scott. It's one we come
> >> back
> >>> to with some regularity so there's real pain and opportunity here for
> the
> >>> project imo.
> >>>
> >>> On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas 
> >> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> Now that many teams have begun testing and validating Apache Cassandra
> >>>> 4.0, it’s useful to think about what “progress” looks like. While
> >> metrics
> >>>> alone may not tell us what “done” means, they do help us answer the
> >>>> question, “are we getting better or worse — and how quickly”?
> >>>>
> >>>> A friend described to me a few attributes of metrics he considered
> >> useful,
> >>>> suggesting that good metrics are actionable, visible, predictive, and
> >>>> consequent:
> >>>>
> >>>> – Actionable: We know what to do based on them – where to invest, what
> >> to
> >>>> fix, what’s fine, etc.
> >>>> – Visible: Everyone who has a stake in a metric has full visibility
> into
> >>>> it and participates in its definition.
> >>>> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
> >>>> “consistent performance test results against build abc predict an x%
> >>>> reduction in 99%ile read latency for this workload in prod".
> >>>> – Consequent: We take actions based on them (e.g., not shipping if
> tests
> >>>> are failing).
> >>>>
> >>>> Here are some notes in Confluence toward metrics that may be useful to
> >>>> track beginning in this phase of the development + release cycle. I’m
> >>>> interested in your thoughts on these. They’re also copied inline for
> >> easier
> >>>> reading in your mail client.
> >>>>
> >>>> Link:
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
> >>>>
> >>>> Cheers,
> >>>>
> >>>> – Scott
> >>>>
> >>>> ––
> >>>>
> >>>> Measuring Release Quality:
> >>>>
> >&

Re: Measuring Release Quality

2018-09-22 Thread Benedict Elliott Smith
ority”
>> 
>> Perhaps we should form a working group to suggest a methodology for
>> filling out JIRA, standardise the necessary components, labels etc, and put
>> together a wiki page with step-by-step instructions on how to do it?
>> 
>> 
>>> On 20 Sep 2018, at 15:29, Joshua McKenzie  wrote:
>>> 
>>> I've spent a good bit of time thinking about the above and bounced off
>> both
>>> different ways to measure quality and progress as well as trying to
>>> influence community behavior on this topic. My advice: start small and
>>> simple (KISS, YAGNI, all that). Get metrics for pass/fail on
>>> utest/dtest/flakiness over time, perhaps also aggregate bug count by
>>> component over time. After spending a predetermined time doing that (a
>>> couple months?) as an experiment, we retrospect as a project and see if
>>> these efforts are adding value commensurate with the time investment
>>> required to perform the measurement and analysis.
>>> 
>>> There's a lot of really good ideas in that linked wiki article / this
>> email
>>> thread. The biggest challenge, and risk of failure, is in translating
>> good
>>> ideas into action and selling project participants on the value of
>> changing
>>> their behavior. The latter is where we've fallen short over the years;
>>> building consensus (especially regarding process /shudder) is Very Hard.
>>> 
>>> Also - thanks for spearheading this discussion Scott. It's one we come
>> back
>>> to with some regularity so there's real pain and opportunity here for the
>>> project imo.
>>> 
>>> On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas 
>> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> Now that many teams have begun testing and validating Apache Cassandra
>>>> 4.0, it’s useful to think about what “progress” looks like. While
>> metrics
>>>> alone may not tell us what “done” means, they do help us answer the
>>>> question, “are we getting better or worse — and how quickly”?
>>>> 
>>>> A friend described to me a few attributes of metrics he considered
>> useful,
>>>> suggesting that good metrics are actionable, visible, predictive, and
>>>> consequent:
>>>> 
>>>> – Actionable: We know what to do based on them – where to invest, what
>> to
>>>> fix, what’s fine, etc.
>>>> – Visible: Everyone who has a stake in a metric has full visibility into
>>>> it and participates in its definition.
>>>> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
>>>> “consistent performance test results against build abc predict an x%
>>>> reduction in 99%ile read latency for this workload in prod".
>>>> – Consequent: We take actions based on them (e.g., not shipping if tests
>>>> are failing).
>>>> 
>>>> Here are some notes in Confluence toward metrics that may be useful to
>>>> track beginning in this phase of the development + release cycle. I’m
>>>> interested in your thoughts on these. They’re also copied inline for
>> easier
>>>> reading in your mail client.
>>>> 
>>>> Link:
>>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
>>>> 
>>>> Cheers,
>>>> 
>>>> – Scott
>>>> 
>>>> ––
>>>> 
>>>> Measuring Release Quality:
>>>> 
>>>> [ This document is a draft + sketch of ideas. It is located in the
>>>> "discussion" section of this wiki to indicate that it is an active
>> draft –
>>>> not a document that has been voted on, achieved consensus, or in any way
>>>> official. ]
>>>> 
>>>> Introduction:
>>>> 
>>>> This document outlines a series of metrics that may be useful toward
>>>> measuring release quality, and quantifying progress during the testing /
>>>> validation phase of the Apache Cassandra 4.0 release cycle.
>>>> 
>>>> The goal of this document is to think through what we should consider
>>>> measuring to quantify our progress testing and validating Apache
>> Cassandra
>>>> 4.0. This document explicitly does not discuss release criteria – though
>>>> metrics may be a useful input to a discussion on that topic.
>>>> 
>>>> 
>>>> Metric: Build / Test Health (produc

Re: Measuring Release Quality

2018-09-22 Thread kurt greaves
lure, is in translating
> good
> > ideas into action and selling project participants on the value of
> changing
> > their behavior. The latter is where we've fallen short over the years;
> > building consensus (especially regarding process /shudder) is Very Hard.
> >
> > Also - thanks for spearheading this discussion Scott. It's one we come
> back
> > to with some regularity so there's real pain and opportunity here for the
> > project imo.
> >
> > On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas 
> wrote:
> >
> >> Hi everyone,
> >>
> >> Now that many teams have begun testing and validating Apache Cassandra
> >> 4.0, it’s useful to think about what “progress” looks like. While
> metrics
> >> alone may not tell us what “done” means, they do help us answer the
> >> question, “are we getting better or worse — and how quickly”?
> >>
> >> A friend described to me a few attributes of metrics he considered
> useful,
> >> suggesting that good metrics are actionable, visible, predictive, and
> >> consequent:
> >>
> >> – Actionable: We know what to do based on them – where to invest, what
> to
> >> fix, what’s fine, etc.
> >> – Visible: Everyone who has a stake in a metric has full visibility into
> >> it and participates in its definition.
> >> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
> >> “consistent performance test results against build abc predict an x%
> >> reduction in 99%ile read latency for this workload in prod".
> >> – Consequent: We take actions based on them (e.g., not shipping if tests
> >> are failing).
> >>
> >> Here are some notes in Confluence toward metrics that may be useful to
> >> track beginning in this phase of the development + release cycle. I’m
> >> interested in your thoughts on these. They’re also copied inline for
> easier
> >> reading in your mail client.
> >>
> >> Link:
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
> >>
> >> Cheers,
> >>
> >> – Scott
> >>
> >> ––
> >>
> >> Measuring Release Quality:
> >>
> >> [ This document is a draft + sketch of ideas. It is located in the
> >> "discussion" section of this wiki to indicate that it is an active
> draft –
> >> not a document that has been voted on, achieved consensus, or in any way
> >> official. ]
> >>
> >> Introduction:
> >>
> >> This document outlines a series of metrics that may be useful toward
> >> measuring release quality, and quantifying progress during the testing /
> >> validation phase of the Apache Cassandra 4.0 release cycle.
> >>
> >> The goal of this document is to think through what we should consider
> >> measuring to quantify our progress testing and validating Apache
> Cassandra
> >> 4.0. This document explicitly does not discuss release criteria – though
> >> metrics may be a useful input to a discussion on that topic.
> >>
> >>
> >> Metric: Build / Test Health (produced via CI, recorded in Confluence):
> >>
> >> Bread-and-butter metrics intended to capture baseline build health,
> >> flakiness in the test suite, and presented as a time series to
> understand
> >> how they’ve changed from build to build and release to release:
> >>
> >> Metrics:
> >>
> >> – Pass / fail metrics for unit tests
> >> – Pass / fail metrics for dtests
> >> – Flakiness stats for unit and dtests
> >>
> >>
> >> Metric: “Found Bug” Count by Methodology (sourced via JQL, reported in
> >> Confluence):
> >>
> >> These are intended to help us understand the efficacy of each
> methodology
> >> being applied. We might consider annotating bugs found in JIRA with the
> >> methodology that produced them. This could be consumed as input in a JQL
> >> query and reported on the Confluence dev wiki.
> >>
> >> As we reach a pareto-optimal level of investment in a methodology, we’d
> >> expect to see its found-bug rate taper. As we achieve higher quality
> across
> >> the board, we’d expect to see a tapering in found-bug counts across all
> >> methodologies. In the event that one or two approaches is an outlier,
> this
> >> could indicate the utility of doubling down on a particular form of
> testing.
> >>
> >> We might consider reporting “Found B

Re: Measuring Release Quality

2018-09-21 Thread Scott Andreas
 fine, etc.
>> – Visible: Everyone who has a stake in a metric has full visibility into
>> it and participates in its definition.
>> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
>> “consistent performance test results against build abc predict an x%
>> reduction in 99%ile read latency for this workload in prod".
>> – Consequent: We take actions based on them (e.g., not shipping if tests
>> are failing).
>>
>> Here are some notes in Confluence toward metrics that may be useful to
>> track beginning in this phase of the development + release cycle. I’m
>> interested in your thoughts on these. They’re also copied inline for easier
>> reading in your mail client.
>>
>> Link:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
>>
>> Cheers,
>>
>> – Scott
>>
>> ––
>>
>> Measuring Release Quality:
>>
>> [ This document is a draft + sketch of ideas. It is located in the
>> "discussion" section of this wiki to indicate that it is an active draft –
>> not a document that has been voted on, achieved consensus, or in any way
>> official. ]
>>
>> Introduction:
>>
>> This document outlines a series of metrics that may be useful toward
>> measuring release quality, and quantifying progress during the testing /
>> validation phase of the Apache Cassandra 4.0 release cycle.
>>
>> The goal of this document is to think through what we should consider
>> measuring to quantify our progress testing and validating Apache Cassandra
>> 4.0. This document explicitly does not discuss release criteria – though
>> metrics may be a useful input to a discussion on that topic.
>>
>>
>> Metric: Build / Test Health (produced via CI, recorded in Confluence):
>>
>> Bread-and-butter metrics intended to capture baseline build health,
>> flakiness in the test suite, and presented as a time series to understand
>> how they’ve changed from build to build and release to release:
>>
>> Metrics:
>>
>> – Pass / fail metrics for unit tests
>> – Pass / fail metrics for dtests
>> – Flakiness stats for unit and dtests
>>
>>
>> Metric: “Found Bug” Count by Methodology (sourced via JQL, reported in
>> Confluence):
>>
>> These are intended to help us understand the efficacy of each methodology
>> being applied. We might consider annotating bugs found in JIRA with the
>> methodology that produced them. This could be consumed as input in a JQL
>> query and reported on the Confluence dev wiki.
>>
>> As we reach a pareto-optimal level of investment in a methodology, we’d
>> expect to see its found-bug rate taper. As we achieve higher quality across
>> the board, we’d expect to see a tapering in found-bug counts across all
>> methodologies. In the event that one or two approaches is an outlier, this
>> could indicate the utility of doubling down on a particular form of testing.
>>
>> We might consider reporting “Found By” counts for methodologies such as:
>>
>> – Property-based / fuzz testing
>> – Replay testing
>> – Upgrade / Diff testing
>> – Performance testing
>> – Shadow traffic
>> – Unit/dtest coverage of new areas
>> – Source audit
>>
>>
>> Metric: “Found Bug” Count by Subsystem/Component (sourced via JQL,
>> reported in Confluence):
>>
>> Similar to “found by,” but “found where.” These metrics help us understand
>> which components or subsystems of the database we’re finding issues in. In
>> the event that a particular area stands out as “hot,” we’ll have the
>> quantitative feedback we need to support investment there. Tracking these
>> counts over time – and their first derivative – the rate – also helps us
>> make statements regarding progress in various subsystems. Though we can’t
>> prove a negative (“no bugs have been found, therefore there are no bugs”),
>> we gain confidence as their rate decreases normalized to the effort we’re
>> putting in.
>>
>> We might consider reporting “Found In” counts for components as enumerated
>> in JIRA, such as:
>> – Auth
>> – Build
>> – Compaction
>> – Compression
>> – Core
>> – CQL
>> – Distributed Metadata
>> – …and so on.
>>
>>
>> Metric: “Found Bug” Count by Severity (sourced via JQL, reported in
>> Confluence)
>>
>> Similar to “found by/where,” but “how bad”? These metrics help us
>> understand the severity of the issues we encounter. As build quality
>> improves, we would expect to see decreases

Re: Measuring Release Quality

2018-09-20 Thread Benedict Elliott Smith
I think it would be great to start getting some high quality info out of JIRA, 
but I think we need to clean up and standardise how we use it to facilitate 
this.  

Take the Component field as an example.  This is the current list of options:

4.0
Auth
Build
Compaction
Configuration
Core
CQL
Distributed Metadata
Documentation and Website
Hints
Libraries
Lifecycle
Local Write-Read Paths
Materialized Views
Metrics
Observability
Packaging
Repair
SASI
Secondary Indexes
Streaming and Messaging
Stress
Testing
Tools

In some cases there's duplication (Metrics + Observability, Coordination 
(=“Storage Proxy, Hints, Batchlog, Counters…") + Hints, Local Write-Read Paths 
+ Core)
In others, there’s a lack of granularity (Streaming + Messaging, Core, 
Coordination, Distributed Metadata)
In others, there’s a lack of clarity (Core, Lifecycle, Coordination)
Others are probably missing entirely (Transient Replication, …?)

Labels are also used fairly haphazardly, and there’s no clear definition of 
“priority”

Perhaps we should form a working group to suggest a methodology for filling out 
JIRA, standardise the necessary components, labels etc, and put together a wiki 
page with step-by-step instructions on how to do it?


> On 20 Sep 2018, at 15:29, Joshua McKenzie  wrote:
> 
> I've spent a good bit of time thinking about the above and bounced off both
> different ways to measure quality and progress as well as trying to
> influence community behavior on this topic. My advice: start small and
> simple (KISS, YAGNI, all that). Get metrics for pass/fail on
> utest/dtest/flakiness over time, perhaps also aggregate bug count by
> component over time. After spending a predetermined time doing that (a
> couple months?) as an experiment, we retrospect as a project and see if
> these efforts are adding value commensurate with the time investment
> required to perform the measurement and analysis.
> 
> There's a lot of really good ideas in that linked wiki article / this email
> thread. The biggest challenge, and risk of failure, is in translating good
> ideas into action and selling project participants on the value of changing
> their behavior. The latter is where we've fallen short over the years;
> building consensus (especially regarding process /shudder) is Very Hard.
> 
> Also - thanks for spearheading this discussion Scott. It's one we come back
> to with some regularity so there's real pain and opportunity here for the
> project imo.
> 
> On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas  wrote:
> 
>> Hi everyone,
>> 
>> Now that many teams have begun testing and validating Apache Cassandra
>> 4.0, it’s useful to think about what “progress” looks like. While metrics
>> alone may not tell us what “done” means, they do help us answer the
>> question, “are we getting better or worse — and how quickly”?
>> 
>> A friend described to me a few attributes of metrics he considered useful,
>> suggesting that good metrics are actionable, visible, predictive, and
>> consequent:
>> 
>> – Actionable: We know what to do based on them – where to invest, what to
>> fix, what’s fine, etc.
>> – Visible: Everyone who has a stake in a metric has full visibility into
>> it and participates in its definition.
>> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
>> “consistent performance test results against build abc predict an x%
>> reduction in 99%ile read latency for this workload in prod".
>> – Consequent: We take actions based on them (e.g., not shipping if tests
>> are failing).
>> 
>> Here are some notes in Confluence toward metrics that may be useful to
>> track beginning in this phase of the development + release cycle. I’m
>> interested in your thoughts on these. They’re also copied inline for easier
>> reading in your mail client.
>> 
>> Link:
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
>> 
>> Cheers,
>> 
>> – Scott
>> 
>> ––
>> 
>> Measuring Release Quality:
>> 
>> [ This document is a draft + sketch of ideas. It is located in the
>> "discussion" section of this wiki to indicate that it is an active draft –
>> not a document that has been voted on, achieved consensus, or in any way
>> official. ]
>> 
>> Introduction:
>> 
>> This document outlines a series of metrics that may be useful toward
>> measuring release quality, and quantifying progress during the testing /
>> validation phase of the Apache Cassandra 4.0 release cycle.
>> 
>> The goal of this document is to think through what we should consider
>> measuring to quantify our progress testing and validating Apache Cassandra
>> 4.0. T

Re: Measuring Release Quality

2018-09-20 Thread Joshua McKenzie
I've spent a good bit of time thinking about the above and bounced off both
different ways to measure quality and progress as well as trying to
influence community behavior on this topic. My advice: start small and
simple (KISS, YAGNI, all that). Get metrics for pass/fail on
utest/dtest/flakiness over time, perhaps also aggregate bug count by
component over time. After spending a predetermined time doing that (a
couple months?) as an experiment, we retrospect as a project and see if
these efforts are adding value commensurate with the time investment
required to perform the measurement and analysis.

There's a lot of really good ideas in that linked wiki article / this email
thread. The biggest challenge, and risk of failure, is in translating good
ideas into action and selling project participants on the value of changing
their behavior. The latter is where we've fallen short over the years;
building consensus (especially regarding process /shudder) is Very Hard.

Also - thanks for spearheading this discussion Scott. It's one we come back
to with some regularity so there's real pain and opportunity here for the
project imo.

On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas  wrote:

> Hi everyone,
>
> Now that many teams have begun testing and validating Apache Cassandra
> 4.0, it’s useful to think about what “progress” looks like. While metrics
> alone may not tell us what “done” means, they do help us answer the
> question, “are we getting better or worse — and how quickly”?
>
> A friend described to me a few attributes of metrics he considered useful,
> suggesting that good metrics are actionable, visible, predictive, and
> consequent:
>
> – Actionable: We know what to do based on them – where to invest, what to
> fix, what’s fine, etc.
> – Visible: Everyone who has a stake in a metric has full visibility into
> it and participates in its definition.
> – Predictive: Good metrics enable forecasting of outcomes – e.g.,
> “consistent performance test results against build abc predict an x%
> reduction in 99%ile read latency for this workload in prod".
> – Consequent: We take actions based on them (e.g., not shipping if tests
> are failing).
>
> Here are some notes in Confluence toward metrics that may be useful to
> track beginning in this phase of the development + release cycle. I’m
> interested in your thoughts on these. They’re also copied inline for easier
> reading in your mail client.
>
> Link:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430
>
> Cheers,
>
> – Scott
>
> ––
>
> Measuring Release Quality:
>
> [ This document is a draft + sketch of ideas. It is located in the
> "discussion" section of this wiki to indicate that it is an active draft –
> not a document that has been voted on, achieved consensus, or in any way
> official. ]
>
> Introduction:
>
> This document outlines a series of metrics that may be useful toward
> measuring release quality, and quantifying progress during the testing /
> validation phase of the Apache Cassandra 4.0 release cycle.
>
> The goal of this document is to think through what we should consider
> measuring to quantify our progress testing and validating Apache Cassandra
> 4.0. This document explicitly does not discuss release criteria – though
> metrics may be a useful input to a discussion on that topic.
>
>
> Metric: Build / Test Health (produced via CI, recorded in Confluence):
>
> Bread-and-butter metrics intended to capture baseline build health,
> flakiness in the test suite, and presented as a time series to understand
> how they’ve changed from build to build and release to release:
>
> Metrics:
>
> – Pass / fail metrics for unit tests
> – Pass / fail metrics for dtests
> – Flakiness stats for unit and dtests
>
>
> Metric: “Found Bug” Count by Methodology (sourced via JQL, reported in
> Confluence):
>
> These are intended to help us understand the efficacy of each methodology
> being applied. We might consider annotating bugs found in JIRA with the
> methodology that produced them. This could be consumed as input in a JQL
> query and reported on the Confluence dev wiki.
>
> As we reach a pareto-optimal level of investment in a methodology, we’d
> expect to see its found-bug rate taper. As we achieve higher quality across
> the board, we’d expect to see a tapering in found-bug counts across all
> methodologies. In the event that one or two approaches is an outlier, this
> could indicate the utility of doubling down on a particular form of testing.
>
> We might consider reporting “Found By” counts for methodologies such as:
>
> – Property-based / fuzz testing
> – Replay testing
> – Upgrade / Diff testing
> – Performance testing
> – Shadow traffic
&g

Measuring Release Quality

2018-09-19 Thread Scott Andreas
Hi everyone,

Now that many teams have begun testing and validating Apache Cassandra 4.0, 
it’s useful to think about what “progress” looks like. While metrics alone may 
not tell us what “done” means, they do help us answer the question, “are we 
getting better or worse — and how quickly”?

A friend described to me a few attributes of metrics he considered useful, 
suggesting that good metrics are actionable, visible, predictive, and 
consequent:

– Actionable: We know what to do based on them – where to invest, what to fix, 
what’s fine, etc.
– Visible: Everyone who has a stake in a metric has full visibility into it and 
participates in its definition.
– Predictive: Good metrics enable forecasting of outcomes – e.g., “consistent 
performance test results against build abc predict an x% reduction in 99%ile 
read latency for this workload in prod".
– Consequent: We take actions based on them (e.g., not shipping if tests are 
failing).

Here are some notes in Confluence toward metrics that may be useful to track 
beginning in this phase of the development + release cycle. I’m interested in 
your thoughts on these. They’re also copied inline for easier reading in your 
mail client.

Link: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=93324430

Cheers,

– Scott

––––––

Measuring Release Quality:

[ This document is a draft + sketch of ideas. It is located in the "discussion" 
section of this wiki to indicate that it is an active draft – not a document 
that has been voted on, achieved consensus, or in any way official. ]

Introduction:

This document outlines a series of metrics that may be useful toward measuring 
release quality, and quantifying progress during the testing / validation phase 
of the Apache Cassandra 4.0 release cycle.

The goal of this document is to think through what we should consider measuring 
to quantify our progress testing and validating Apache Cassandra 4.0. This 
document explicitly does not discuss release criteria – though metrics may be a 
useful input to a discussion on that topic.


Metric: Build / Test Health (produced via CI, recorded in Confluence):

Bread-and-butter metrics intended to capture baseline build health, flakiness 
in the test suite, and presented as a time series to understand how they’ve 
changed from build to build and release to release:

Metrics:

– Pass / fail metrics for unit tests
– Pass / fail metrics for dtests
– Flakiness stats for unit and dtests


Metric: “Found Bug” Count by Methodology (sourced via JQL, reported in 
Confluence):

These are intended to help us understand the efficacy of each methodology being 
applied. We might consider annotating bugs found in JIRA with the methodology 
that produced them. This could be consumed as input in a JQL query and reported 
on the Confluence dev wiki.

As we reach a pareto-optimal level of investment in a methodology, we’d expect 
to see its found-bug rate taper. As we achieve higher quality across the board, 
we’d expect to see a tapering in found-bug counts across all methodologies. In 
the event that one or two approaches is an outlier, this could indicate the 
utility of doubling down on a particular form of testing.

We might consider reporting “Found By” counts for methodologies such as:

– Property-based / fuzz testing
– Replay testing
– Upgrade / Diff testing
– Performance testing
– Shadow traffic
– Unit/dtest coverage of new areas
– Source audit


Metric: “Found Bug” Count by Subsystem/Component (sourced via JQL, reported in 
Confluence):

Similar to “found by,” but “found where.” These metrics help us understand 
which components or subsystems of the database we’re finding issues in. In the 
event that a particular area stands out as “hot,” we’ll have the quantitative 
feedback we need to support investment there. Tracking these counts over time – 
and their first derivative – the rate – also helps us make statements regarding 
progress in various subsystems. Though we can’t prove a negative (“no bugs have 
been found, therefore there are no bugs”), we gain confidence as their rate 
decreases normalized to the effort we’re putting in.

We might consider reporting “Found In” counts for components as enumerated in 
JIRA, such as:
– Auth
– Build
– Compaction
– Compression
– Core
– CQL
– Distributed Metadata
– …and so on.


Metric: “Found Bug” Count by Severity (sourced via JQL, reported in Confluence)

Similar to “found by/where,” but “how bad”? These metrics help us understand 
the severity of the issues we encounter. As build quality improves, we would 
expect to see decreases in the severity of issues identified. A high rate of 
critical issues identified late in the release cycle would be cause for 
concern, though it may be expected at an earlier time.

These could roughly be sourced from the “Priority” field in JIRA:
– Trivial
– Minor
– Major
– Critical
– Blocker

While “priority” doesn’t map directly to “severity,” it may be a useful proxy. 
Altern