Hi, Lucene and Solr Community.

My name is Igor Wiese, phd Student from Brazil. In my research I am
investigating two important questions: What makes two files change
together? Can we predict when they are going to co-change again?

I've tried to investigate this question on the Lucene and Solr project.
I've collected data from issue reports, discussions and commits and using
some machine learning techniques to build a prediction model.

I collected a total of 1382 commits in which a pair of files changed
together and could correctly predict 66% commits in the Lucene Project. For
the Solr Project I collected a total of 111 commits in which a pair of
files changed together and could correctly predict 47% commits.

These were the most useful information for predicting co-changes of files:

- number of lines of code added,

- number of lines of code removed,

- sum of number of lines of code added, modified and removed,

- number of words used to describe and discuss the issues, and

- median value of closeness, a social network measure obtained from issue
comments.

To illustrate, consider the following example in Lucene Project from our
analysis. For release 4.7, the files "lucene/index/IndexWriter.java" and
"lucene/index/StandardDirectoryReader.java" changed together in 4 commits.
In another 11 commits, only the first file changed, but not the second.
Collecting contextual information for each commit made to first file in
previous release, we were able to predict 3 commits in which both files
changed together in release 4.7, and we issued 0 false positive, and one
wrong prediction. For this pair of files, the most important contextual
information was the number of lines of code added in each commit, the
number of words used to describe and discuss the issues, the number of
comments in each issue and the social network metric (closeness) obtained
from issue comments.

- Do these results surprise you? Can you think in any explanation for the
results?

- Do you think that our rate of prediction is good enough to be used for
building tool support for the software community?

- Do you have any suggestion on what can be done to improve the change
recommendation?

You can visit a webpage to inspect the results in details:

Lucene Project: http://flosscoach.com/index.php/17-cochanges/73-lucene
Solr Project: http://flosscoach.com/index.php/17-cochanges/74-solr

All the best,
Igor Wiese
Phd Candidate

Reply via email to