[
https://issues.apache.org/jira/browse/MAHOUT-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895719#action_12895719
]
Ted Dunning commented on MAHOUT-455:
------------------------------------
I apologize for the tone of my remark. I normally try not to make new users
feel that way in the Mahout mailing list.
Thank you for the correction and for taking the trouble to file the JIRA.
{quote}
I think that at the very least, NearestNUserNeighborhood's constructor could
throw an IllegalArgumentException if n is larger than the number of users in
the DataModel. This would make the API easier to use because the exception
would come from the upper level, rather than from the depths of the
implementation.
{quote}
THIS is a useful suggestion.
> NearestNUserNeighborhood problems with large Ns
> -----------------------------------------------
>
> Key: MAHOUT-455
> URL: https://issues.apache.org/jira/browse/MAHOUT-455
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering
> Affects Versions: 0.3
> Environment: Linux
> Reporter: Yanir Seroussi
> Priority: Minor
>
> I set a large n for NearestNUserNeighborhood, with the intention of including
> all users in the neighbourhood. However, I encountered the following problems:
> (1) If n is set to Integer.MAX_VALUE, the program crashes with the following
> stack trace:
> Exception in thread "main" java.lang.IllegalArgumentException
> at java.util.PriorityQueue.<init>(PriorityQueue.java:152)
> at
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
> at
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is because TopItems.getTopUsers() tries to create a PriorityQueue with a
> capacity of Integer.MAX_VALUE + 1.
> (2) If n is set to a large integer value (e.g., 1 billion), it crashes with
> the following stack trace:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.PriorityQueue.<init>(PriorityQueue.java:153)
> at
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopUsers(TopItems.java:93)
> at
> org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood.getUserNeighborhood(NearestNUserNeighborhood.java:111)
> This is due to the same reason - trying to create a PriorityQueue with size n
> + 1.
> In my opinion, this should be fixed by changing n to the number of users in
> the DataModel when NearestNUserNeighborhood is created, or by letting users
> specify n = -1 (or a similar value) when they want the user neighbourhood to
> include all users.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.