Yes, there are a few components here -- a few different purposes. All
build around the core library which isn't specific to Hadoop or an
HTTP server, but you've seen some of the components that adapt the
core to this contexts. There are also components that can evaluate or
load test the code.

The only piece you are interested in then is really the Hadoop
integration -- see org.apache.mahout.cf.taste.hadoop. There you will
find RecommenderJob which should be able to launch a
pseudo-distributed recommender job. I say pseudo since these
algorithms are not in general distributable, but, one can of course
run n instances of a recommender to compute 1/nth of all
recommendations each. That is nice, though it means, say, the amount
of RAM the jobs consume is still limited by the size of each machine.

I just recently rewrote this package to be compatible with Hadoop
0.20's new APIs. I do not know that it works, and, have some reason to
believe there are bugs in the API that will prevent it from working.
So this piece is currently in flux.

If you want to experiment and be a guinea pig for this latest
revision, I can provide close support to work through the bugs on both
sides. Or we can talk about your requirements more a bit to figure out
whether this is feasible, what the best algorithm is, whether you need
Hadoop?

How big is 'massive'? could you reveal how many users, items, and
user-item preferences to an order of magnitude? what is generally the
nature of the input data you have, and you want recommendations out?

On Wed, Jul 22, 2009 at 12:12 AM, Aurora
Skarra-Gallagher<[email protected]> wrote:
> Hi,
>
> I apologize if I've misunderstood the purpose of the Taste component of 
> Mahout. Our goal was to take a recommendation framework and use our own 
> recommendation algorithm within it. We need to process a massive amount of 
> data, and wanted it to be done on our Hadoop grid. I thought that Taste was 
> the right fit for the job. I'm not interested in the HTTP service. I'm 
> interested in the recommendation framework, particularly from a back-end 
> batch perspective. Does that help clarify? Thanks for helping me sort through 
> this.
>
> -Aurora
>
>
> On 7/21/09 3:02 PM, "Sean Owen" <[email protected]> wrote:
>
> Hmm, lots going on here, it's confusing.
>
> Are you trying to run this on Hadoop intentionally? because the web
> app example is not intended to run on Hadoop. It's a component
> intended to serve recommendations over HTTP in real time. It also
> appears you are running an evaluation rather than a web app serving
> requests. I realize you're trying to run this without Jetty, but
> that's kind of like trying to run a web app without a web server.
>
> I think you'd have to clarify what you are trying to do, and then what
> you are doing right now, to begin to assist.
>
> On Tue, Jul 21, 2009 at 9:20 PM, Aurora
> Skarra-Gallagher<[email protected]> wrote:
>> Hi,
>>
>> I'm trying to run the taste web example without using jetty. Our gateways 
>> aren't meant to be used as webservers. By poking around, I found that the 
>> following command worked:
>> hadoop --config ~/hod-clusters/test jar 
>> /x/mahout-current/examples/target/mahout-examples-0.2-SNAPSHOT.job 
>> org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner
>>
>> The output is:
>> 09/07/21 19:59:21 INFO file.FileDataModel: Creating FileDataModel for file 
>> /tmp/ratings.txt
>> 09/07/21 19:59:21 INFO eval.AbstractDifferenceRecommenderEvaluator: 
>> Beginning evaluation using 0.9 of GroupLensDataModel
>> 09/07/21 19:59:22 INFO file.FileDataModel: Reading file info...
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 100000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 200000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 300000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 400000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 500000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 600000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 700000 lines
>> 09/07/21 19:59:22 INFO file.FileDataModel: Processed 800000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 900000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Processed 1000000 lines
>> 09/07/21 19:59:23 INFO file.FileDataModel: Read lines: 1000209
>> 09/07/21 19:59:30 INFO slopeone.MemoryDiffStorage: Building average diffs...
>> 09/07/21 19:59:42 INFO eval.AbstractDifferenceRecommenderEvaluator: 
>> Evaluation result: 0.7035965559003973
>> 09/07/21 19:59:42 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 
>> 0.7035965559003973
>>
>> The job appears to write data to /tmp/ratings.txt and /tmp/movies.txt. I'm 
>> not sure if this is the correct way to run this example. I have a few 
>> questions:
>>
>>  1.  Is the output file /tmp/ratings.txt? If so, how do I interpret it?
>>  2.  What does the Evaluation result mean?
>>  3.  Is it even running on HDFS?
>>  4.  Is it a map-reduce job?
>>
>> Any pointers on how to run this as a standalone job would be helpful.
>>
>> Thanks,
>> Aurora
>>
>
>

Reply via email to