[
https://issues.apache.org/jira/browse/CASSANDRA-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Russ Hatch updated CASSANDRA-8954:
----------------------------------
Description:
Some changes to source are much more risky than others, and we can analyze data
from JIRA + git to make educated guesses about risk level. This is a backwards
looking technique with limitations but still may be useful (yes, the past does
not equal the future!).
(disclaimer: I did not come up with this technique).
The executive summary: 1) correlate changes with defects, by code unit such as
filename 2) quantify risk of new patches by combining correlation with a
measure of change "size", as (correlation * change_size)
The basic idea is to build a tool which correlates past Defect tickets to the
files which were changed to fix them. If a Defect required changes to specific
files to fix, then in some sense past changes to those files (or their original
implementations) were problematic. Therefore, future changes to those files
carry some potential risk as well.
This requires getting an occasional dump of Defect type issues, and an
occasional dump of commit messages. Defects would have to be associated to
commits based on a text search of commit messages. From there we build a
weighted model of which source files get touched the most to fix defects (say
giving each file name a ranking of 1 to 10 where 10 carries the most risk).
To analyze specific patches going forward we look at the defect weight for that
source file, and factor in a metric for a patch's changes in that file (maybe
(lines changed/total lines), OR (change in cyclomatic complexity/total
complexity)). Out of this we get a number representing a theoretical risk.
was:
Some changes to source are much more risky than others, and we can analyze data
from JIRA + git to make educated guesses about risk level. This is a backwards
looking technique with limitations but still may be useful (yes, the past does
not equal the future!).
(disclaimer: I did not come up with this technique).
The basic idea is to build a tool which correlates past Defect tickets to the
files which were changed to fix them. If a Defect required changes to specific
files to fix, then in some sense past changes to those files (or their original
implementations) were problematic. Therefore, future changes to those files
carry some potential risk as well.
This requires getting an occasional dump of Defect type issues, and an
occasional dump of commit messages. Defects would have to be associated to
commits based on a text search of commit messages. From there we build a
weighted model of which source files get touched the most to fix defects (say
giving each file name a ranking of 1 to 10 where 10 carries the most risk).
To analyze specific patches going forward we look at the defect weight for that
source file, and factor in a metric for a patch's changes in that file (maybe
(lines changed/total lines), OR (change in cyclomatic complexity/total
complexity)). Out of this we get a number representing a theoretical risk.
> risk analysis of patches based on past defects
> ----------------------------------------------
>
> Key: CASSANDRA-8954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8954
> Project: Cassandra
> Issue Type: Test
> Reporter: Russ Hatch
> Assignee: Russ Hatch
>
> Some changes to source are much more risky than others, and we can analyze
> data from JIRA + git to make educated guesses about risk level. This is a
> backwards looking technique with limitations but still may be useful (yes,
> the past does not equal the future!).
> (disclaimer: I did not come up with this technique).
> The executive summary: 1) correlate changes with defects, by code unit such
> as filename 2) quantify risk of new patches by combining correlation with a
> measure of change "size", as (correlation * change_size)
> The basic idea is to build a tool which correlates past Defect tickets to the
> files which were changed to fix them. If a Defect required changes to
> specific files to fix, then in some sense past changes to those files (or
> their original implementations) were problematic. Therefore, future changes
> to those files carry some potential risk as well.
> This requires getting an occasional dump of Defect type issues, and an
> occasional dump of commit messages. Defects would have to be associated to
> commits based on a text search of commit messages. From there we build a
> weighted model of which source files get touched the most to fix defects (say
> giving each file name a ranking of 1 to 10 where 10 carries the most risk).
> To analyze specific patches going forward we look at the defect weight for
> that source file, and factor in a metric for a patch's changes in that file
> (maybe (lines changed/total lines), OR (change in cyclomatic complexity/total
> complexity)). Out of this we get a number representing a theoretical risk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)