Yeah if you squint hard enough, many of these algorithms reduce to being quite similar, or being applicable in similar situations. They're specializations or recombinations of similar ideas into different specific problem domains.
Because you've said the words "association rules", stuff like FP growth sounds more appropriate. But I can describe what mostSimilarItems() does in case it happens to suit you better. It just returns the items with highest similarity to a given item, where 'similarity' is defined by a given ItemSimilarity implementation. Using an implementation like LogLikelihoodSimilarity, you could easily discover items which co-occur unusually frequently. Or with PearsonCorrelationSimilarity you could base the similarity measure on traditional correlation of ratings -- if you have item ratings. You could copy-and-paste this method and modify it to simply discover the item-item pairs with highest similarity over all pairs. It's very simple. The good and bad news about this method is it's not distributed. If your data is medium-sized -- here my rule of thumb is roughly less than 100M data points -- I bet it'll suit you fine to run a non-distributed job based on this bit of code to do your work. If you need a distributed solution... well you could pick out the map-reduce phase in org.apache.mahout.cf.taste.hadoop.item which computes co-occurrence and then write a second job to pick out the highest co-occurrences. Very simple and quick as map-reduces go. On Wed, Apr 14, 2010 at 3:20 PM, Sebastian Feher <sfe...@crossview.com> wrote: > Hi All, > > I'm looking at extracting association rules with Mahout. If I understand it > correctly, both GenericItemBasedRecommender.mostSimilarItems() and Parallel > FP-Growth seem to provide support for doing that. Is this true? If not what > are the major differences between the two (including scalability, > performance)? Thanks. > > Sebastian