On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang <pete...@gmail.com> wrote:
> I'm wondering about good strategies for picking keys that I want to be > lexically sorted in a super column family. For example, my data looks like > this: > > [user1_uuid][connections][some_key_for_user2] = "" > [user1_uuid][connections][some_key_for_user3] = "" > > I was thinking that I wanted some_key_for_user2 to be sorted by a user's > name. So I was thinking I set the subcolumn compareWith to UTF8Type or > BytesType and construct a key > > [user's lastname + user's firstname + user's uuid] > > This would result in sorted subcolumn and user list. That's fine. But I > wonder what would happen if, say, a user changes their last name. Happens > rarely but I imagine people getting married and modifying their name. Now > the sort is no longer correct. There seems to be some bad consequences to > creating keys based on data that can change. > > So what is the general (elegant, easy to maintain) strategy here? Always > sort in your server-side code and don't bother trying to have the data > sorted? > Having row keys based on something potentially volatile is something I would avoid since that determines which machine the row belongs to and moving data between machines isn't a cheap operation. What you'll probably want to do is make the key something unique (like a uuid), store the user's name as a column on the row (thus making it easy to update) and maintain a secondary index to get the named-based sorting you want. If you're expecting a few million users, maintaining the index in a special row will work fine (eg, the row name is "NAMEINDEX" and the columns are the name+uuid similar to what you described.) If you have billions of users, you'll need to get a bit fancier (partition based on letter of the last name, for example.) -Brandon