Hi MG. Thanks for the portuguese :-)

I really enjoyed your example. I don't know much about the Lucene/Solr
architecture of, but I completely agree. Probably, there is a design
problem in this case because the classes seem to be "related". But some of
pairs of files that we tested, we couldn't make assumptions because it is
not clear why the classes changed together. We probably need to manually
inspect the set issues where the files changed together to find the
"reason". In some cases, could be very difficuld without have enough
know-how of the project.

The good point is that "for a newcomer", for example, it would be hard to
find the relation that you mentioned. In such cases we could help :). Do
you agree?

I really enjoyed the ideia of "maven plugin". We are creating a tool like a
"web service" that could be integrated with the Issue Tracker, but.. i
really liked your ideia. I will think about it. Thanks!

Probably we couldn't predict with 100% of accuracy in all of cases :-). In
average, as I mentioned, to Lucene we tested more than 1000 commits with
66% of accuracy. To solr the accuracy was low (47%). Probably, the reason
to this low accuracy in Solr is related to the number of commits that we
used to construct the prediction models. We used 10x less commits in Solr
than Lucene.

Considering that in each 4 commits, in 3 of them we could give you good
recomendations to change two files together, is good? Do you think that
could "save" your time to find the correct files to complete the change?

Thanks Again, MG
All the best,
Igor Wiese


2015-12-10 11:21 GMT-02:00 Martin Gainty <mgai...@hotmail.com>:

>
>
>
>
> ------------------------------
> From: igor.wi...@gmail.com
> Date: Wed, 9 Dec 2015 23:48:10 +0000
> Subject: Feedback of my Phd work in Lucene and Solr project
> To: dev@lucene.apache.org
>
> Hi, Lucene and Solr Community.
>
> My name is Igor Wiese, phd Student from Brazil. In my research I am
> investigating two important questions: What makes two files change
> together? Can we predict when they are going to co-change again?
>
> I've tried to investigate this question on the Lucene and Solr project.
> I've collected data from issue reports, discussions and commits and using
> some machine learning techniques to build a prediction model.
>
> I collected a total of 1382 commits in which a pair of files changed
> together and could correctly predict 66% commits in the Lucene Project. For
> the Solr Project I collected a total of 111 commits in which a pair of
> files changed together and could correctly predict 47% commits.
>
> These were the most useful information for predicting co-changes of files:
>
> - number of lines of code added,
>
> - number of lines of code removed,
>
> - sum of number of lines of code added, modified and removed,
>
> - number of words used to describe and discuss the issues, and
>
> - median value of closeness, a social network measure obtained from issue
> comments.
>
> To illustrate, consider the following example in Lucene Project from our
> analysis. For release 4.7, the files "lucene/index/IndexWriter.java" and
> "lucene/index/StandardDirectoryReader.java" changed together in 4 commits.
> In another 11 commits, only the first file changed, but not the second.
> Collecting contextual information for each commit made to first file in
> previous release, we were able to predict 3 commits in which both files
> changed together in release 4.7, and we issued 0 false positive, and one
> wrong prediction. For this pair of files, the most important contextual
> information was the number of lines of code added in each commit, the
> number of words used to describe and discuss the issues, the number of
> comments in each issue and the social network metric (closeness) obtained
> from issue comments.
>
> MG>if the pairing was 100% accurate then yes a predictor for both files
> changing indicates a design issue is lurking i.e
> MG>IndexWriter and StandardDirectoryWriter "share functionality" which
> would suggest breaking shared methods to interface
> MG>refactoring IndexWriter and StandardDirectoryReader to each implement
> that shared Interface
> MG>if attributes are to be shared then perhaps an abstract class should be
> created to contain those shared attributes and implement
> MG>the shared methods
> MG>refactoring IndexWriter and StandardDirectoryReader to extend the
> abstract class should force implementor to override/reuse
> MG>shared attributes in the Abstract Base Class?
>
> - Do these results surprise you? Can you think in any explanation for the
> results?
>
> - Do you think that our rate of prediction is good enough to be used for
> building tool support for the software community?
>
> MG>if the plugin can predict with 100% accuracy?
>
> - Do you have any suggestion on what can be done to improve the change
> recommendation?
>
> MG>create the tool as a maven plugin so we can bind this functionality to
> one of the pre compile phases e.g. process-sources?
>
> You can visit a webpage to inspect the results in details:
>
> Lucene Project: http://flosscoach.com/index.php/17-cochanges/73-lucene
> Solr Project: http://flosscoach.com/index.php/17-cochanges/74-solr
>
> All the best,
> Igor Wiese
> Phd Candidate
>
> MG>Obrigado do EEUU
>



-- 
=================================
Igor Scaliante Wiese
PhD Candidate - Computer Science @ IME/USP
Faculty in Dept. of Computing at Universidade Tecnológica Federal do Paraná

Reply via email to