Sorry for the late reply -- I've been traveling. On Fri, Sep 26, 2008 at 6:52 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I've been reading the chapter on recommendations in Programming Collective > Intelligence and looking at Taste. The examples in PCI
(PS that is a really good book. Recommended -- highly recommended -- to everyone involved with Mahout. I kinda cross-checked what I had done against the book and think it agrees. The book suggested more good ideas, particularly the Tanimoto coefficient business.) > I can't really use Euclidean distance or Pearson correlation coefficient, can > I? You could but it wouldn't make much sense. In the framework I do have an implementation of Preference which is supposed to encapsulate a binary value like this. Its existence means a 'yes' and as far as the framework is concerned means the user expresses a '1.0' preference for the item. That value doesn't really matter. (and yes, it would be more efficient to not have such a simple dummy implementation of Preference to represent this. I threw it in since it fits cleanly in the framework. Get it right first -- then make it fast. If there is interest in these areas then we start making more customized versions of User and some of the algorithms that take advantage of the fact that preferences are binary.) > What do people use in such scenarios? Would it make sense to use > http://en.wikipedia.org/wiki/Jaccard_index for such cases? > ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly > that, good. > > But then my question is: > Doesn't the use of Jaccard/Tanimoto mean going back to the expensive > user-user similarity computation? TanimotoCoefficientSimilarity implements both UserSimilarity and ItemSimilarity, so it can be plugged into either a user-based or item-based recommender, which need a UserSimilarity or ItemSimilarity, respectively. So, no, you aren't forced to user-based recommenders in this context.
