Hi Ankur, Given a data set with two users with their associated browsing and order history:
User 1 browsed: P1, P2, P3, P4 purchased: P2, P3 User 2 browsed: P2, P4, P6 purchased: P2, P4 In order to support the generation of recommendations I am now computing the item pairs and the pair count (minsup=0): <P1, P2> 1 <P1, P3> 1 <P2, P2> 2 <P2, P3> 1 <P2, P4> 1 <P3, P2> 1 <P3, P3> 1 <P4, P2> 2 <P4, P3> 1 <P4, P4> 1 <P6, P2> 1 <P6, P4> 1 This lets me provide a simplistic recommendation: Browsed P2 -> Purchased : P2 2 times/50%, P3 1 time/25%, P4 1 time/25% Can you clarify "that can be done as post step where you remove similar items for a given item that were not purchased"? How would it work in this case? I'm currently looking into the second option you recommended. Thanks! Sebastian On Apr 15, 2010, at 9:33 AM, Ankur C. Goel wrote: Sebastian, The current recommender implementations does not make a distinction between a 'browsed' item and a 'purchased' item when calculating similarity. So that can be done as post step where you remove similar items for a given item that were not purchased. The second option is to extend the 'Preference' interface for adding an API to get the type information. You will then need to also provide appropriate implementation (default is GenericPreference). You would then add a doMostSimilarPurchasedItems() method to GenericItemBaseRecommender along with few other changes. Obviously this is more work. With FP mining algorithm the simplest thing is to just retain itemsets that contain purchased items instead of modifying the algorithm itself. This may result in interesting frequent itemsets where 2 different types of items were browsed and purchased together. -...@nkur On 4/15/10 5:40 PM, "Sebastian Feher" <sfe...@crossview.com> wrote: Robin, Sebastian, Sean, thanks for your responses. Yes that is exactly what I am looking for: computing frequent item sets based on co-browse, co-purchase, co-searching, user-item ratings and other user-item activities and then use these frequent item sets to provide recommendations for an active item and/or an active user. Regarding the GenericItemBasedRecommender.mostSimilarItems() I've used both Tanimoto and also defined a custom similarity function that works the same way to my current custom coded frequent item sets algorithm that I'm trying to replace and test with Mahout. There are a few questions that I'm not able to answer: - do you support cross-type frequent item sets? for example - people who Browsed this item - ended up purchasing these items. In this case the item pairs are generated by taking one item from the Browse space and the other from Purchase space. Is this something that can be achieved with the current algorithms(GenericItemBasedRecommender.mostSimilarItems(), FP-Growth) in there existing form and if not there an extension mechanism that allows me to do that in a clean fashion or do I have to modify the algorithm code? Thanks On Apr 14, 2010, at 11:46 AM, Sebastian Schelter wrote: Hi Sebastian, I can only help you with what GenericItemBasedRecommender.mostSimilarItems() does. It's basically what you know from amazon.com: "People who like this item also like the following items". Mathematically spoken, you have a matrix of the preferences of users towards items and mostSimilarItems() searches the highest ranking item vectors using some similarity function (usually cosine or pearson correlation). A good overview about how item-based collaborative filtering works and what the most similar items are can be found in this paper (helped me understand the whole issue): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9927&rep=rep1&type=pdf Regards, Sebastian Sebastian Feher schrieb: Hi All, I'm looking at extracting association rules with Mahout. If I understand it correctly, both GenericItemBasedRecommender.mostSimilarItems() and Parallel FP-Growth seem to provide support for doing that. Is this true? If not what are the major differences between the two (including scalability, performance)? Thanks. Sebastian __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com