Given the speedups that can often be achieved by efficient disk access in a
large batch computation, this is more reasonable than it sounds on the face
of it.  The example of updating 1% of the records in a 1TB database
demonstrates how doing 100x more work (rewriting the entire database) can be
done about 100x faster (5-6 hours instead of 30 days).  If you can attain
comparable speedups and if > 0.01% of your audience comes back to view the
recommendations that you have pre-computed, then you win by building in
batch off-line.

Of course, if some clever spark figures out a way to do most of the work in
a batch and then just add water to the dehydrated recommendations in real
time you win even more because recommendations can change in real-time.  I
can't comment further than that.

On Sat, Apr 19, 2008 at 6:12 PM, Sean Owen <[EMAIL PROTECTED]> wrote:

> Honestly it is hard to do recommendations in real-time; most
> algorithms don't scale and don't parallelize easily. I've recommended
> to most people to just recompute recommendations offline periodically.
>
>
-- 
ted

Reply via email to