On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang <pete...@gmail.com> wrote:

> I'm wondering about good strategies for picking keys that I want to be
> lexically sorted in a super column family. For example, my data looks like
> this:
>
> [user1_uuid][connections][some_key_for_user2] = ""
> [user1_uuid][connections][some_key_for_user3] = ""
>
> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key
>
> [user's lastname + user's firstname + user's uuid]
>
> This would result in sorted subcolumn and user list. That's fine. But I
> wonder what would happen if, say, a user changes their last name. Happens
> rarely but I imagine people getting married and modifying their name. Now
> the sort is no longer correct. There seems to be some bad consequences to
> creating keys based on data that can change.
>
> So what is the general (elegant, easy to maintain) strategy here? Always
> sort in your server-side code and don't bother trying to have the data
> sorted?
>

Having row keys based on something potentially volatile is something I would
avoid since that determines which machine the row belongs to and moving data
between machines isn't a cheap operation.

What you'll probably want to do is make the key something unique (like a
uuid), store the user's name as a column on the row (thus making it easy to
update) and maintain a secondary index to get the named-based sorting you
want.  If you're expecting a few million users, maintaining the index in a
special row will work fine (eg, the row name is "NAMEINDEX" and the columns
are the name+uuid similar to what you described.)  If you have billions of
users, you'll need to get a bit fancier (partition based on letter of the
last name, for example.)

-Brandon

Reply via email to