Hi Alejandro, I won't comment on the issue itself (I am sure Sean and others will), since I haven't looked at the code, but https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute describes how to submit a patch. File a ticket in JIRA and provide the patch along with your test cases.
-Grant On Apr 6, 2011, at 10:13 AM, Alejandro Bellogin Kouki wrote: > Hi all, > > I've been using Mahout for many years now, mainly for my Master's thesis, and > now for my PhD thesis. That is why, first, I want to congratulate you for the > effort of putting such a library as open source. > > At this point, my main concern is recommendation, and, because of that, I > have been using the different recommenders, evaluators and similarities > implemented in the library. However, today, after many times inspecting your > code, I have found, IMHO, a relevant bug with further implications. > > It is related with the computation of the similarity. Although this is not > the only implemented similarity, Pearson's correlation is one of the most > popular one. This similarity requires to normalise (or "center") the data > using the user's mean, in order to be able to distinguish a user who usually > rates items with 5's from a user who usually rates them with 3's, even though > in a particular item both rated it with a 5. The problem is that the user's > means are being calculated using ONLY the items in common between the two > users, leading to strange similarity computations (or worse, to no similarity > at all!). It is not difficult to find small examples showing this behaviour, > besides, seminal papers assume the overall mean rating is used [1, 2]. > > Since I am a newbie on this patch and bug/fix terminology, I would like to > know what is the best (or the only?) way of including this finding. I have to > say that I already have fixed the code (it affects to the AbstractSimilarity > class, and therefore, it would have an impact on other similarity functions > too). > > Best regards, > Alejandro > > [1] M. J. Pazzani: "A framework for collaborative, content-based and > demographic filtering". Artificial Intelligence Review 13, pp. 393-408. 1999 > [2] C. Desrosiers, G. Karypis: "A comprehensive survey of neighborhood-based > recommendation methods". Recommender Systems Handbook, chapter 4. 2010 > > -- > Alejandro Bellogin Kouki > http://rincon.uam.es/dir?cw=435275268554687 > -------------------------- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org
