But wouldn't name + UUID be considered volatile? That was the crux of my

On Fri, Mar 12, 2010 at 1:07 PM, Brandon Williams <dri...@gmail.com> wrote:

> On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang <pete...@gmail.com> wrote:
>> I'm wondering about good strategies for picking keys that I want to be
>> lexically sorted in a super column family. For example, my data looks like
>> this:
>> [user1_uuid][connections][some_key_for_user2] = ""
>> [user1_uuid][connections][some_key_for_user3] = ""
>> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
>> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key
>> [user's lastname + user's firstname + user's uuid]
>> This would result in sorted subcolumn and user list. That's fine. But I
>> wonder what would happen if, say, a user changes their last name. Happens
>> rarely but I imagine people getting married and modifying their name. Now
>> the sort is no longer correct. There seems to be some bad consequences to
>> creating keys based on data that can change.
>> So what is the general (elegant, easy to maintain) strategy here? Always
>> sort in your server-side code and don't bother trying to have the data
>> sorted?
> Having row keys based on something potentially volatile is something I
> would avoid since that determines which machine the row belongs to and
> moving data between machines isn't a cheap operation.
> What you'll probably want to do is make the key something unique (like a
> uuid), store the user's name as a column on the row (thus making it easy to
> update) and maintain a secondary index to get the named-based sorting you
> want.  If you're expecting a few million users, maintaining the index in a
> special row will work fine (eg, the row name is "NAMEINDEX" and the columns
> are the name+uuid similar to what you described.)  If you have billions of
> users, you'll need to get a bit fancier (partition based on letter of the
> last name, for example.)
> -Brandon

Reply via email to