I would think this is really just a small variation on the nearest-n
version, which only ever keeps up to n users in consideration. You
just add an additional filter criteria. So yes I agree, your second
approach is right.

On Fri, Nov 7, 2008 at 8:26 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Of course, now that I put this in JIRA I'm wondering if treating similarity 
> as the main neighbourhood membership determiner...
> In other words, what I wrote says:
> Include all users whose similarity to target user is > minSimilarity.  Then, 
> if the hood is large, optionally trim the hood to maxHoodSize.
>
> Scary things happen (read: slowness) if you use minSimilarity=0.001 or some 
> other small number.  This will create a large hood.
>
> So now I'm wondering if one should use maxHoodSize as the primary determiner, 
> so that the code instead does this:
> Include top maxHoodSize users.  Then remove all users whose similarity to 
> target user is < minSimilarity.
>
> I tested both approaches and they are equally fast UF you pick good 
> minSimilarity.  But if you pick an overly low similarity.... ouch - huge hood 
> + slow.  If you pick to high minSimilarity you risk finding no users that 
> meet that criterium.
>
> The drawback of purely n-nearest approach is that the n-nearest people may 
> really not be very near.  Consequently, recommendations derived from them 
> will not be the best.  My change tries to guard against that, but one might 
> argue that getting some not-so-good recommendations is still better then 
> getting no recommendations (e.g because the given minSimilarity disqualifies 
> all users and results in 0-sized neighbourhood).
>
> Thinking our loud...
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Otis Gospodnetic (JIRA) <[EMAIL PROTECTED]>
>> To: [email protected]
>> Sent: Friday, November 7, 2008 1:04:54 PM
>> Subject: [jira] Created: (MAHOUT-95) UserSimilarity-based 
>> NearestNNeighborhood
>>
>> UserSimilarity-based NearestNNeighborhood
>> -----------------------------------------
>>
>>                  Key: MAHOUT-95
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-95
>>              Project: Mahout
>>           Issue Type: Improvement
>>           Components: Collaborative Filtering
>>             Reporter: Otis Gospodnetic
>>             Priority: Minor
>>          Attachments: UserSimilarityNearestNUserNeighborhood.java
>>
>> A variation of NearestNUserNeighborhood.  This version adds the minSimilarity
>> parameter, which is the primary factor for including/excluding other users 
>> from
>> the target user's neighbourhood.  Additionally, the 'n' parameter was 
>> renamed to
>> maxHoodSize and is used to optionally limit the size of the neighbourhood.
>>
>> The patch is for a brand new class, but we may really want just a single 
>> class
>> (either keep this one and axe NearestNUserNeighborhood or add this 
>> functionality
>> to NearestNUserNeighborhood), if this sounds good.
>>
>> I'll update the unit test and provide a patch for that if others think this 
>> can
>> go in.
>>
>> Thoughts?
>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>

Reply via email to