Re: Recommending when working with binary data sets

Sean Owen Tue, 30 Sep 2008 05:35:22 -0700

Sorry for the late reply -- I've been traveling.

On Fri, Sep 26, 2008 at 6:52 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> I've been reading the chapter on recommendations in Programming Collective 
> Intelligence and looking at Taste.  The examples in PCI


(PS that is a really good book. Recommended -- highly recommended --
to everyone involved with Mahout. I kinda cross-checked what I had
done against the book and think it agrees. The book suggested more
good ideas, particularly the Tanimoto coefficient business.)

> I can't really use Euclidean distance or Pearson correlation coefficient, can 
> I?

You could but it wouldn't make much sense. In the framework I do have
an implementation of Preference which is supposed to encapsulate a
binary value like this. Its existence means a 'yes' and as far as the
framework is concerned means the user expresses a '1.0' preference for
the item. That value doesn't really matter.

(and yes, it would be more efficient to not have such a simple dummy
implementation of Preference to represent this. I threw it in since it
fits cleanly in the framework. Get it right first -- then make it
fast. If there is interest in these areas then we start making more
customized versions of User and some of the algorithms that take
advantage of the fact that preferences are binary.)


> What do people use in such scenarios?  Would it make sense to use 
> http://en.wikipedia.org/wiki/Jaccard_index for such cases?
> ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly 
> that, good.
>
> But then my question is:
> Doesn't the use of Jaccard/Tanimoto mean going back to the expensive 
> user-user similarity computation?

TanimotoCoefficientSimilarity implements both UserSimilarity and
ItemSimilarity, so it can be plugged into either a user-based or
item-based recommender, which need a UserSimilarity or ItemSimilarity,
respectively. So, no, you aren't forced to user-based recommenders in
this context.

Re: Recommending when working with binary data sets

Reply via email to