I would think this is really just a small variation on the nearest-n version, which only ever keeps up to n users in consideration. You just add an additional filter criteria. So yes I agree, your second approach is right.
On Fri, Nov 7, 2008 at 8:26 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Of course, now that I put this in JIRA I'm wondering if treating similarity > as the main neighbourhood membership determiner... > In other words, what I wrote says: > Include all users whose similarity to target user is > minSimilarity. Then, > if the hood is large, optionally trim the hood to maxHoodSize. > > Scary things happen (read: slowness) if you use minSimilarity=0.001 or some > other small number. This will create a large hood. > > So now I'm wondering if one should use maxHoodSize as the primary determiner, > so that the code instead does this: > Include top maxHoodSize users. Then remove all users whose similarity to > target user is < minSimilarity. > > I tested both approaches and they are equally fast UF you pick good > minSimilarity. But if you pick an overly low similarity.... ouch - huge hood > + slow. If you pick to high minSimilarity you risk finding no users that > meet that criterium. > > The drawback of purely n-nearest approach is that the n-nearest people may > really not be very near. Consequently, recommendations derived from them > will not be the best. My change tries to guard against that, but one might > argue that getting some not-so-good recommendations is still better then > getting no recommendations (e.g because the given minSimilarity disqualifies > all users and results in 0-sized neighbourhood). > > Thinking our loud... > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: Otis Gospodnetic (JIRA) <[EMAIL PROTECTED]> >> To: [email protected] >> Sent: Friday, November 7, 2008 1:04:54 PM >> Subject: [jira] Created: (MAHOUT-95) UserSimilarity-based >> NearestNNeighborhood >> >> UserSimilarity-based NearestNNeighborhood >> ----------------------------------------- >> >> Key: MAHOUT-95 >> URL: https://issues.apache.org/jira/browse/MAHOUT-95 >> Project: Mahout >> Issue Type: Improvement >> Components: Collaborative Filtering >> Reporter: Otis Gospodnetic >> Priority: Minor >> Attachments: UserSimilarityNearestNUserNeighborhood.java >> >> A variation of NearestNUserNeighborhood. This version adds the minSimilarity >> parameter, which is the primary factor for including/excluding other users >> from >> the target user's neighbourhood. Additionally, the 'n' parameter was >> renamed to >> maxHoodSize and is used to optionally limit the size of the neighbourhood. >> >> The patch is for a brand new class, but we may really want just a single >> class >> (either keep this one and axe NearestNUserNeighborhood or add this >> functionality >> to NearestNUserNeighborhood), if this sounds good. >> >> I'll update the unit test and provide a patch for that if others think this >> can >> go in. >> >> Thoughts? >> >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >
