Re: schema design question

Jonathan Ellis Mon, 08 Mar 2010 19:34:43 -0800

On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari <matteo.capr...@gmail.com> wrote:
> The 'key' queries are:


These map straightforwardly to one CF per query.

> - list all the items a user liked

row key is user id, columns names are timeuuid of when the like-ing
occurred, column value is either item id, or a supercolumn containing
the denormalized item data

> - list all the users that liked an item

row key is item id, column names are same timeuuids, values are either
user id or again denormalized

> - list all users and count how many items each user liked
> (we need this every few hours and in fact we are only interested in
> the top N users that liked most stuff)

row key is something you hardcode ("topusers"), column names are Long
values of how many liked, column value is user id or denormalized user
data

If you just need it every few hours, run a map/reduce job (Hadoop
integration in 0.6) to compute this that often.  Otherwise you will
have to update it on each insert for each user which is probably a bad
idea if you have millions of users (all that activity will go to just
the machines replicating that row).  And if you have tens of millions
of users you will almost certainly run into the
row-must-fit-in-memory-during-compaction limitation that we're
removing in 0.7.

-Jonathan

Re: schema design question

Reply via email to