wouldn't using Long values as the column names for the 3rd CF cause
potential conflicts if 2 users liked the same # of items? (only saving one
user for any given value)

was thinking about this same problem (sorted lists of top N user activity)
and thought that was a roadblock for that design.


On Mon, Mar 8, 2010 at 7:33 PM, Jonathan Ellis <> wrote:

> On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari <>
> wrote:
> > The 'key' queries are:
> These map straightforwardly to one CF per query.
> > - list all the items a user liked
> row key is user id, columns names are timeuuid of when the like-ing
> occurred, column value is either item id, or a supercolumn containing
> the denormalized item data
> > - list all the users that liked an item
> row key is item id, column names are same timeuuids, values are either
> user id or again denormalized
> > - list all users and count how many items each user liked
> > (we need this every few hours and in fact we are only interested in
> > the top N users that liked most stuff)
> row key is something you hardcode ("topusers"), column names are Long
> values of how many liked, column value is user id or denormalized user
> data
> If you just need it every few hours, run a map/reduce job (Hadoop
> integration in 0.6) to compute this that often.  Otherwise you will
> have to update it on each insert for each user which is probably a bad
> idea if you have millions of users (all that activity will go to just
> the machines replicating that row).  And if you have tens of millions
> of users you will almost certainly run into the
> row-must-fit-in-memory-during-compaction limitation that we're
> removing in 0.7.
> -Jonathan

Reply via email to