Thanks, it's what I thought. Now, I would like to store neighborhood for all my users, so, in fact, if for one user, i need to compute the similarity between this one and all users, I have to compute all pairs ? or there is something better ?
The method getNeighborhood do that ? compute with all users ? Thanks srowen wrote: > > Do you need to compute the similarity between all pairs of users in > order to measure similarity between any two users? no, not at all. > There are several implementations of UserSimilarity and in general > they will only look at the data associated to the two users being > compared, not all users. > > Computing a neighborhood is different. There, in theory, you do need > to compute the similarity between one user, and all other users (but > still, not all pairs), and pick some set of most-similar users. (And > there are optimizations -- for example, you could sample 10% of all > other users to form a "pretty good" neighborhood rather than actually > look at everyone else.) > > You bring up clustering. Indeed that is one approach. You start by > clustering users -- basically, making a bunch of disjoint > neighborhoods ahead of time -- and then recommending from within the > cluster. You can do that somewhat more efficiently than looking at all > pairs, still. See TreeClusteringRecommender. > > Yes, anything that requires looking at all pairs of users could be > disastrously slow. > > If you have a lot of users, but few items, consider using an > item-based recommender instead. This would scale better. > > On Tue, Jul 7, 2009 at 12:36 AM, charlysf<[email protected]> wrote: >> >> Hello, >> >> I currently working on a small database, I understand that, when I need >> the >> similarity between users, it's basically the compute between all pairs of >> users. >> >> It's that ? or it's better ? >> If it's that, how can I expect a quick compute for 1 million rows ? >> >> I don't see what is the difference between asking for the neighborhood, >> to >> compute the similarity for all pairs of users. >> >> Because I thought, something could be interesting : >> Make some clusters of users, and only compute the similarity between >> users >> in my cluster. >> >> Thanks >> -- >> View this message in context: >> http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364711.html >> Sent from the Mahout User List mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Compute-similarities-for-an-hudge-quantity-of-data-tp24364711p24364984.html Sent from the Mahout User List mailing list archive at Nabble.com.
