Hi, I apologize if I've misunderstood the purpose of the Taste component of Mahout. Our goal was to take a recommendation framework and use our own recommendation algorithm within it. We need to process a massive amount of data, and wanted it to be done on our Hadoop grid. I thought that Taste was the right fit for the job. I'm not interested in the HTTP service. I'm interested in the recommendation framework, particularly from a back-end batch perspective. Does that help clarify? Thanks for helping me sort through this.
-Aurora On 7/21/09 3:02 PM, "Sean Owen" <[email protected]> wrote: Hmm, lots going on here, it's confusing. Are you trying to run this on Hadoop intentionally? because the web app example is not intended to run on Hadoop. It's a component intended to serve recommendations over HTTP in real time. It also appears you are running an evaluation rather than a web app serving requests. I realize you're trying to run this without Jetty, but that's kind of like trying to run a web app without a web server. I think you'd have to clarify what you are trying to do, and then what you are doing right now, to begin to assist. On Tue, Jul 21, 2009 at 9:20 PM, Aurora Skarra-Gallagher<[email protected]> wrote: > Hi, > > I'm trying to run the taste web example without using jetty. Our gateways > aren't meant to be used as webservers. By poking around, I found that the > following command worked: > hadoop --config ~/hod-clusters/test jar > /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job > org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner > > The output is: > 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file > /tmp/ratings.txt > 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning > evaluation using 0.9 of GroupLensDataModel > 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info... > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines > 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines > 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines > 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines > 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209 > 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs... > 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: > Evaluation result: 0.7035965559003973 > 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: > 0.7035965559003973 > > The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm > not sure if this is the correct way to run this example. I have a few > questions: > > 1. Is the output file /tmp/ratings.txt? If so, how do I interpret it? > 2. What does the Evaluation result mean? > 3. Is it even running on HDFS? > 4. Is it a map-reduce job? > > Any pointers on how to run this as a standalone job would be helpful. > > Thanks, > Aurora >
