I normally deal with this by purposefully limiting the length of these rows.
 The argument is that if I never recommend more than 100 items to a person
(or 20 or 1000 ... the argument doesn't change), then none of the item ->
item* mappings needs to have more than 100 items since the tail of the list
can't affect the top 100 recommendations anyway.  It is also useful to limit
the user history to either only recent or only important ratings.  That
means that a typical big multi-get is something like 100 history items x 100
related items = 10,000 items x 10 bytes for id+score.  This sounds kind of
big, but the average case is 5x smaller.

On Mon, May 31, 2010 at 4:01 PM, Sean Owen <[email protected]> wrote:

> I'd be a little concerned about whether this fits comfortably in
> memory. The similarity matrix is potentially dense -- big rows -- and
> you're loading one row per item the user has rated. It could get into
> tens of megabytes for one query. The distributed version dares not do
> this. But, worth a try in principle.
>

Reply via email to