Given the speedups that can often be achieved by efficient disk access in a large batch computation, this is more reasonable than it sounds on the face of it. The example of updating 1% of the records in a 1TB database demonstrates how doing 100x more work (rewriting the entire database) can be done about 100x faster (5-6 hours instead of 30 days). If you can attain comparable speedups and if > 0.01% of your audience comes back to view the recommendations that you have pre-computed, then you win by building in batch off-line.
Of course, if some clever spark figures out a way to do most of the work in a batch and then just add water to the dehydrated recommendations in real time you win even more because recommendations can change in real-time. I can't comment further than that. On Sat, Apr 19, 2008 at 6:12 PM, Sean Owen <[EMAIL PROTECTED]> wrote: > Honestly it is hard to do recommendations in real-time; most > algorithms don't scale and don't parallelize easily. I've recommended > to most people to just recompute recommendations offline periodically. > > -- ted
