Re: schema design question

Keith Thornhill Mon, 08 Mar 2010 22:43:37 -0800

jonathan,

wouldn't using Long values as the column names for the 3rd CF cause
potential conflicts if 2 users liked the same # of items? (only saving one
user for any given value)


was thinking about this same problem (sorted lists of top N user activity)
and thought that was a roadblock for that design.

-keith

On Mon, Mar 8, 2010 at 7:33 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari <matteo.capr...@gmail.com>
> wrote:
> > The 'key' queries are:
>
> These map straightforwardly to one CF per query.
>
> > - list all the items a user liked
>
> row key is user id, columns names are timeuuid of when the like-ing
> occurred, column value is either item id, or a supercolumn containing
> the denormalized item data
>
> > - list all the users that liked an item
>
> row key is item id, column names are same timeuuids, values are either
> user id or again denormalized
>
> > - list all users and count how many items each user liked
> > (we need this every few hours and in fact we are only interested in
> > the top N users that liked most stuff)
>
> row key is something you hardcode ("topusers"), column names are Long
> values of how many liked, column value is user id or denormalized user
> data
>
> If you just need it every few hours, run a map/reduce job (Hadoop
> integration in 0.6) to compute this that often.  Otherwise you will
> have to update it on each insert for each user which is probably a bad
> idea if you have millions of users (all that activity will go to just
> the machines replicating that row).  And if you have tens of millions
> of users you will almost certainly run into the
> row-must-fit-in-memory-during-compaction limitation that we're
> removing in 0.7.
>
> -Jonathan
>

Re: schema design question

Reply via email to