[ 
https://issues.apache.org/jira/browse/MAHOUT-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896474#action_12896474
 ] 

Yanir Seroussi commented on MAHOUT-455:
---------------------------------------

Sorry for filing the JIRA without discussing the issue on the mailing list 
first, I didn't notice that this is the way it's supposed to be done.

Anyway, I want to implement a fix that checks for the number of users in the 
dataModel that is given to the constructor, but the problem is that 
DataModel.getNumUsers() throws TasteException. If I change the constructors to 
rethrow the TasteException it will break backward compatibility, while catching 
the exception doesn't really make sense.
On a side note, I don't really see why DataModel.getNumUsers() is declared to 
throw TasteException, as none of its implementations throws this exception (as 
far as I can see), and I can't think of a case where a DataModel client will 
catch such an exception and do something useful with it, but I might be 
mistaken.

Setting an arbitrary upper bound on n that is independent of the data model 
seems overly-restrictive to me, as any number we choose has no relation to the 
properties of the data model. Ted, is there a bound that you think would be 
suitable?

By the way, I'm not sure if this is the right place for discussing these 
issues, so I apologise in advance if I got it wrong.

> NearestNUserNeighborhood problems with large Ns
> -----------------------------------------------
>
>                 Key: MAHOUT-455
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-455
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>         Environment: Linux
>            Reporter: Yanir Seroussi
>            Priority: Minor
>
> I set a large n for NearestNUserNeighborhood, with the intention of including 
> all users in the neighbourhood. However, I encountered the following problems:
> (1) If n is set to Integer.MAX_VALUE, the program crashes with the following 
> stack trace:
> Exception in thread "main" java.lang.IllegalArgumentException
>       at java.util.PriorityQueue.<init>(PriorityQueue.java:152)
>       at 
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
>       at 
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is because TopItems.getTopUsers() tries to create a PriorityQueue with a 
> capacity of Integer.MAX_VALUE + 1.
> (2) If n is set to a large integer value (e.g., 1 billion), it crashes with 
> the following stack trace:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at java.util.PriorityQueue.<init>(PriorityQueue.java:153)
>       at 
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
>       at 
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is due to the same reason - trying to create a PriorityQueue with size n 
> + 1.
> In my opinion, this should be fixed by changing n to the number of users in 
> the DataModel when NearestNUserNeighborhood is created, or by letting users 
> specify n = -1 (or a similar value) when they want the user neighbourhood to 
> include all users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to