[ https://issues.apache.org/jira/browse/MAHOUT-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Zhong updated MAHOUT-1958: ------------------------------ Status: Patch Available (was: Open) diff --git a/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java b/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java index 88fbe58..760f10c 100644 --- a/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java +++ b/mr/src/main/java/org/apache/mahout/cf/taste/impl/similarity/CityBlockSimilarity.java @@ -53,9 +53,9 @@ @Override public double itemSimilarity(long itemID1, long itemID2) throws TasteException { DataModel dataModel = getDataModel(); - int preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1); - int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2); - int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2); + long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1); + long preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2); + long intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, itemID2); return doSimilarity(preferring1, preferring2, intersection); } @@ -90,8 +90,8 @@ * @param pref2 number of non-zero values in right vector * @param intersection number of overlapping non-zero values */ - private static double doSimilarity(int pref1, int pref2, int intersection) { - int distance = pref1 + pref2 - 2 * intersection; + private static double doSimilarity(long pref1, long pref2, long intersection) { + long distance = pref1 + pref2 - 2 * intersection; return 1.0 / (1.0 + distance); } > CityBlockSimilarity.itemSimilarities can overflow > ------------------------------------------------- > > Key: MAHOUT-1958 > URL: https://issues.apache.org/jira/browse/MAHOUT-1958 > Project: Mahout > Issue Type: Bug > Components: Math > Affects Versions: 1.0.0 > Reporter: Hao Zhong > > The CityBlockSimilarity.itemSimilarities method has the following code: > {code:title=CityBlockSimilarity.java|borderStyle=solid} > int preferring2 = dataModel.getNumUsersWithPreferenceFor(itemID2s[i]); > int intersection = dataModel.getNumUsersWithPreferenceFor(itemID1, > itemID2s[i]); > {code} > Here, the two methods return long values, and can overflow. Indeed, > LogLikelihoodSimilaritydoItemSimilarity once had the same problem. The fixed > code is > {code:title=LogLikelihoodSimilaritydoItemSimilarity.java|borderStyle=solid} > long preferring1 = dataModel.getNumUsersWithPreferenceFor(itemID1); > long numUsers = dataModel.getNumUsers(); > {code} > Please refer to MAHOUT-738 for details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)