[ 
https://issues.apache.org/jira/browse/MAHOUT-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895607#action_12895607
 ] 

Yanir Seroussi commented on MAHOUT-455:
---------------------------------------

With all due respect, responding in such a condescending way is indeed quite 
rude and discouraging. I went through the trouble of opening an account and 
reporting this issue, which I still think is an issue, in order to help the 
developers. Making your clients/customers feel like idiots is not really the 
way to go. However, I will take the time to explain why I still think this is a 
problem.

First, from an API design point of view, it is well-known that APIs should 
"throw exceptions appropriate to the abstraction" (Effective Java, 2nd ed., 
Item 61). Since every method along the way is declared to throw TasteException 
(ignoring two other Effective Java guidelines: "Avoid unnecessary use of 
checked exceptions" -- item 59 and "Favor the use of standard exceptions" -- 
item 60, but never mind), it makes sense to throw exceptions that clients can 
understand without digging into the code.

Second, it is common practice to test neighbour-based collaborative filtering 
systems by varying the number of neighbours. For example, Herlocker et al 
(1999) (http://portal.acm.org/citation.cfm?id=312682) experimented with 
neighbourhood sizes of up to 100, while Koren (2008) 
(http://portal.acm.org/citation.cfm?id=1401944) tested neighbourhood sizes of 
up to infinity, which is exactly what I was trying to do.

I think that at the very least, NearestNUserNeighborhood's constructor could 
throw an IllegalArgumentException if n is larger than the number of users in 
the DataModel. This would make the API easier to use because the exception 
would come from the upper level, rather than from the depths of the 
implementation.

> NearestNUserNeighborhood problems with large Ns
> -----------------------------------------------
>
>                 Key: MAHOUT-455
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-455
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.3
>         Environment: Linux
>            Reporter: Yanir Seroussi
>            Priority: Minor
>
> I set a large n for NearestNUserNeighborhood, with the intention of including 
> all users in the neighbourhood. However, I encountered the following problems:
> (1) If n is set to Integer.MAX_VALUE, the program crashes with the following 
> stack trace:
> Exception in thread "main" java.lang.IllegalArgumentException
>       at java.util.PriorityQueue.<init>(PriorityQueue.java:152)
>       at 
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
>       at 
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is because TopItems.getTopUsers() tries to create a PriorityQueue with a 
> capacity of Integer.MAX_VALUE + 1.
> (2) If n is set to a large integer value (e.g., 1 billion), it crashes with 
> the following stack trace:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>       at java.util.PriorityQueue.<init>(PriorityQueue.java:153)
>       at 
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
>       at 
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is due to the same reason - trying to create a PriorityQueue with size n 
> + 1.
> In my opinion, this should be fixed by changing n to the number of users in 
> the DataModel when NearestNUserNeighborhood is created, or by letting users 
> specify n = -1 (or a similar value) when they want the user neighbourhood to 
> include all users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to