Hi, Derby Community. My name is Igor Wiese, phd Student from Brazil. In my research I am investigating two important questions: What makes two files change together? Can we predict when they are going to co-change again?
I've tried to investigate this question on the Derby project. I've collected data from issue reports, discussions and commits and using some machine learning techniques to build a prediction model. I collected a total of 5266 commits in which a pair of files changed together and could correctly predict 86% commits. These were the most useful information for predicting co-changes of files: - number of lines of code added, - number of lines of code removed, - sum of number of lines of code added, modified and removed, - number of words used to describe and discuss the issues, and - median value of closeness, a social network measure obtained from issue comments. To illustrate, consider the following example from our analysis. For release 10.10, the files "sql/catalog/DataDictionaryImpl.java" and "impl/storeless/EmptyDictionary.java" changed together in 7 commits. In another 4 commits, only the first file changed, but not the second. Collecting contextual information for each commit made to first file in the previous release, we were able to predict all 7 commits in which both files changed together in release 10.10, and we only issued 2 wrong predictions. For this pair of files, the most important contextual information was the number of lines of code added, removed and modified in each commit, and a social network measure (constraint) obtained from issue comments. - Do these results surprise you? Can you think in any explanation for the results? - Do you think that our rate of prediction is good enough to be used for building tool support for the software community? - Do you have any suggestion on what can be done to improve the change recommendation? You can visit our webpage to inspect the results in details: http://flosscoach.com/index.php/17-cochanges/69-derby All the best, Igor Wiese Phd Candidate
