Honestly it is hard to do recommendations in real-time; most
algorithms don't scale and don't parallelize easily. I've recommended
to most people to just recompute recommendations offline periodically.

Of course anything that's an online recommender can be used offline,
and I don't think the code is or must be designed with one or the
other in mind.

Yes I don't think one can make a real-time online system out of
Hadoop, that's not the idea. I think it can be used to crudely
parallelize offline computation of this sort... which is better than
no parallelization.

And then I think pieces of particular algorithms, like slope-one, can
be very much parallelized.

Bottom-line, I think the existing code can continue to provide on-line
recommendations which could be useful for small- to medium-sized data
sets, and can cleanly support computations in Hadoop. No redesign
should be needed. After the code is committed, let me provide some
examples of what I mean.

Sean

On Sat, Apr 19, 2008 at 6:35 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> Sean Owen wrote:
>
> > Yeah it should be easy and fine to separate the EJB, web service
> > clients further. Beyond that I think it's mostly driven by what we
> > want to achieve, and it sounds like that is Hadoop-ifying it
> > basically.
> >
> >
> >
>  So far, I was always thinking of mahout as a backend process. it would
> produce a file (or two), and that would be sucked up into SOLR or mysql (or
> whatever) that the webapp would make use of. Obviously this is a PITA as you
> would introduce a delay in how long it took before a event gets fed back
> into the system.
>
>  Mainly because we (our developers) know how to scale solr and mysql very
> easily, and making a hadoop cluster into a OLTP thing is completely new to
> us, and I was thinking it was not really designed for 10-30ms response
> times.
>
>
>  Or am I misjudging HDFS? could you run a webserver farm serving lots of
> static files on top of HDFS?
>  --Ian
>

Reply via email to