Honestly it is hard to do recommendations in real-time; most algorithms don't scale and don't parallelize easily. I've recommended to most people to just recompute recommendations offline periodically.
Of course anything that's an online recommender can be used offline, and I don't think the code is or must be designed with one or the other in mind. Yes I don't think one can make a real-time online system out of Hadoop, that's not the idea. I think it can be used to crudely parallelize offline computation of this sort... which is better than no parallelization. And then I think pieces of particular algorithms, like slope-one, can be very much parallelized. Bottom-line, I think the existing code can continue to provide on-line recommendations which could be useful for small- to medium-sized data sets, and can cleanly support computations in Hadoop. No redesign should be needed. After the code is committed, let me provide some examples of what I mean. Sean On Sat, Apr 19, 2008 at 6:35 PM, Ian Holsman <[EMAIL PROTECTED]> wrote: > Sean Owen wrote: > > > Yeah it should be easy and fine to separate the EJB, web service > > clients further. Beyond that I think it's mostly driven by what we > > want to achieve, and it sounds like that is Hadoop-ifying it > > basically. > > > > > > > So far, I was always thinking of mahout as a backend process. it would > produce a file (or two), and that would be sucked up into SOLR or mysql (or > whatever) that the webapp would make use of. Obviously this is a PITA as you > would introduce a delay in how long it took before a event gets fed back > into the system. > > Mainly because we (our developers) know how to scale solr and mysql very > easily, and making a hadoop cluster into a OLTP thing is completely new to > us, and I was thinking it was not really designed for 10-30ms response > times. > > > Or am I misjudging HDFS? could you run a webserver farm serving lots of > static files on top of HDFS? > --Ian >
