On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari <matteo.capr...@gmail.com> wrote: > The 'key' queries are:
These map straightforwardly to one CF per query. > - list all the items a user liked row key is user id, columns names are timeuuid of when the like-ing occurred, column value is either item id, or a supercolumn containing the denormalized item data > - list all the users that liked an item row key is item id, column names are same timeuuids, values are either user id or again denormalized > - list all users and count how many items each user liked > (we need this every few hours and in fact we are only interested in > the top N users that liked most stuff) row key is something you hardcode ("topusers"), column names are Long values of how many liked, column value is user id or denormalized user data If you just need it every few hours, run a map/reduce job (Hadoop integration in 0.6) to compute this that often. Otherwise you will have to update it on each insert for each user which is probably a bad idea if you have millions of users (all that activity will go to just the machines replicating that row). And if you have tens of millions of users you will almost certainly run into the row-must-fit-in-memory-during-compaction limitation that we're removing in 0.7. -Jonathan