Re: Compute similarities for an hudge quantity of data

charlysf Mon, 06 Jul 2009 17:04:37 -0700

Thanks, it's what I thought.
Now, I would like to store neighborhood for all my users, so, in fact, if
for one user, i need to compute the similarity between this one and all
users, I have to compute all pairs ? or there is something better ?


The method getNeighborhood do that ? compute with all users ?

Thanks


srowen wrote:
> 
> Do you need to compute the similarity between all pairs of users in
> order to measure similarity between any two users? no, not at all.
> There are several implementations of UserSimilarity and in general
> they will only look at the data associated to the two users being
> compared, not all users.
> 
> Computing a neighborhood is different. There, in theory, you do need
> to compute the similarity between one user, and all other users (but
> still, not all pairs), and pick some set of most-similar users. (And
> there are optimizations -- for example, you could sample 10% of all
> other users to form a "pretty good" neighborhood rather than actually
> look at everyone else.)
> 
> You bring up clustering. Indeed that is one approach. You start by
> clustering users -- basically, making a bunch of disjoint
> neighborhoods ahead of time -- and then recommending from within the
> cluster. You can do that somewhat more efficiently than looking at all
> pairs, still. See TreeClusteringRecommender.
> 
> Yes, anything that requires looking at all pairs of users could be
> disastrously slow.
> 
> If you have a lot of users, but few items, consider using an
> item-based recommender instead. This would scale better.
> 
> On Tue, Jul 7, 2009 at 12:36 AM, charlysf<[email protected]> wrote:
>>
>> Hello,
>>
>> I currently working on a small database, I understand that, when I need
>> the
>> similarity between users, it's basically the compute between all pairs of
>> users.
>>
>> It's that ? or it's better ?
>> If it's that, how can I expect a quick compute for 1 million rows ?
>>
>> I don't see what is the difference between asking for the neighborhood,
>> to
>> compute the similarity for all pairs of users.
>>
>> Because I thought, something could be interesting :
>> Make some clusters of users, and only compute the similarity between
>> users
>> in my cluster.
>>
>> Thanks
>> --
>> View this message in context:
>> http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364711.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364984.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Compute similarities for an hudge quantity of data

Reply via email to