The demo is intended to work on the 1M rating data set from GroupLens, so you can use that for demo purposes.
It uses slope-one, which is memory intensive (scales as the square of the number of items). This is OK for a demo but probably going to run through a 1.6GB heap with 10x more items. You could raise your heap size -- without knowing what your data set is like, couldn't quite predict how much heap you'll need. It depends on the number of items. If you're interested in what levers you could pull to get a more functional implementation here: - You can tweak how much data the MemoryDiffStorage implementation keeps, to fit it into less memory. By default it keeps a lot -- too much for a reasonably big data set - You can store the diffs in a database with JDBCDiffStorage - You can just try a simple user-based recommender instead and avoid this entirely. My rough guidelines is that 10M data points needs about 500MB of heap to load. This example has nothing to do with Hadoop actually. Sean On Tue, Oct 27, 2009 at 9:43 AM, michal shmueli <[email protected]> wrote: > Hi, > > I'm trying to utilize the taste demo (grouplens) with my data which consists > of ~700,000 users with ~10M ratings. I'm using an Hadoop cluster with 4 > machines and > also set the MAVEN_OPTS=-Xmx1660M. I keep getting out of memory error > (below). I understand that Hadoop is not necessary for taste, however, is > there a way to utilize my cluster total memory (which is large) to run > this. > If not, what is the limitation with respect to number of users/ratings that > I should expect. > > thanks. > Michal > > > > Oct 22, 2009 2:26:41 PM org.slf4j.impl. > JCLLoggerAdapter info > INFO: Building average diffs... > 2009-10-22 14:27:30.904:WARN::FAILED taste-recommender: > java.lang.OutOfMemoryError: Java heap space > 2009-10-22 14:27:30.904:WARN::FAILED > jettywebappcont...@2a5b8e8c > @2a5b8e8c/,file:/home/michal/ > trunk/taste-web/target/tmp/webapp/,/home/michal/trunk/ > taste-web/target/mahout-taste-webapp-0.2-SNAPSHOT.war: > java.lang.OutOfMemoryError: Java heap space > 2009-10-22 14:27:30.904:WARN::FAILED > contexthandlercollect...@3cccc621: java.lang.OutOfMemoryError: Java > heap space > 2009-10-22 14:27:30.904:WARN::FAILED handlercollect...@27e3bfb6: > java.lang.OutOfMemoryError: Java heap space > 2009-10-22 14:27:30.904:WARN::Error starting handlers > java.lang.OutOfMemoryError: Java heap space > at > org.apache.mahout.cf.taste.impl.common.FastByIDMap.rehash(FastByIDMap.java:260) > at > org.apache.mahout.cf.taste.impl.common.FastByIDMap.growAndRehash(FastByIDMap.java:247) > at > org.apache.mahout.cf.taste.impl.common.FastByIDMap.put(FastByIDMap.java:154) > at > org.apache.mahout.cf.taste.impl.recommender.slopeone.MemoryDiffStorage.processOneUser(MemoryDiffStorage.java:286) > at > org.apache.mahout.cf.taste.impl.recommender.slopeone.MemoryDiffStorage.buildAverageDiffs(MemoryDiffStorage.java:220) > at > org.apache.mahout.cf.taste.impl.recommender.slopeone.MemoryDiffStorage.<init>(MemoryDiffStorage.java:115) > at > org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender.<init>(SlopeOneRecommender.java:63) > at > org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender.<init>(GroupLensRecommender.java:56) > at > org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender.<init>(GroupLensRecommender.java:45) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at java.lang.Class.newInstance0(Class.java:355) > at java.lang.Class.newInstance(Class.java:308) > at > org.apache.mahout.cf.taste.web.RecommenderSingleton.<init>(RecommenderSingleton.java:51) > at > org.apache.mahout.cf.taste.web.RecommenderSingleton.initializeIfNeeded(RecommenderSingleton.java:42) > at > org.apache.mahout.cf.taste.web.RecommenderServlet.init(RecommenderServlet.java:74) > at > org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:421) > at > org.eclipse.jetty.servlet.ServletHolder.doStart(ServletHolder.java:245) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) > at > org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:694) > at > org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:193) > at > org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:913) > at > org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:584) > at > org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:341) > at > org.mortbay.jetty.plugin.JettyWebAppContext.doStart(JettyWebAppContext.java:102) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) > at > org.eclipse.jetty.server.handler.HandlerCollection.doStart(HandlerCollection.java:164) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) > at > org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:55) > at > org.eclipse.jetty.server.handler.HandlerCollection.doStart(HandlerCollection.java:164) > 2009-10-22 14:27:31.534:INFO::Started [email protected]:8080 > [INFO] Started Jetty Server >
