Hi, Cassandra Community.

My name is Igor Wiese, phd Student from Brazil. I am investigating two
important questions: What makes two files change together? Can we predict
when they are going to co-change again?

I've tried to investigate this question on the Cassandra project. I've
collected data from issue reports, discussions and commits and using some
machine learning techniques to build a prediction model.

I collected a total of 1197 commits in which a pair of files changed
together and could correctly predict 48% commits. These were the most
useful information for predicting co-changes of files:

- number of lines of code added,

- number of lines of code removed,

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues, and

- median value of closeness, a social network measure obtained from issue
comments.

To illustrate, consider the following example from our analysis. For
release 1.0, the files "cassandra/tools/NodeCmd.java" and
"cassandra/tools/NodeProbe.java" changed together in 16 commits. In another
6 commits, only the first file changed, but not the second. Collecting
contextual information for each commit made to first file in the previous
release, we were able to predict all 13 commits in which both files changed
together in release 1.0, and we only issued 2 false positives. For this
pair of files, the most important contextual information was the number of
lines of code added, removed and modified in each commit, the number of
words used to describe and discuss the issues and the number of comments in
the issues.

- Do these results surprise you? Can you think in any explanation for the
results?

- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?

You can visit our webpage to inspect the results in details:
http://flosscoach.com/index.php/17-cochanges/66-cassandra

All the best,
Igor Wiese
Phd Candidate

Reply via email to