On Sat, Dec 12, 2009 at 3:16 PM, Sean Owen <[email protected]> wrote:
Recommendations are computed for one user at a time, by multiplying the co-occurrence matrix by the user preference vector. And then yes it's one big job invoking computation for all users. I'm running this all one one machine (my laptop) so it's kind of serialized anyway. yes it was 10 seconds to compute all recs for one user; it's a couple secs now with some more work. That's still rough but not awful. So when doing a big batch of a thousand users, say, you're saying it's taking your laptop 3 hours to do this using the Hadoop-based code (in pseudo-distributed mode)? All of it is on Hadoop here. It's pretty simple -- make the user vectors, make the co-occurrence matrix (all that is quite fast), then multiply the two to make recommendations. You do the co-occurrence matrix (for item-by-item, right?) on Hadoop too, and that part is really fast, but computing the recommendations is very slow? By what orders of magnitude, for the whole set? What are the scales you are testing with, in terms of total number of users, items, and ratings? -jake
