Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:46 PM, Peter Chang  wrote:

> Yes, I can update that one entry. But what if that subcolumn key is used
> across many different places?
>
> ['Jones-Bob']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ['Crabtree-Sam']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ['Rice-Brown']['connections']
> ['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
> ...
>
> I can update every single entry but now I need to keep track of them (which
> I guess I'm doing anyway). I was wondering if there was a more elegant
> solution but it seems unlikely based on the given constraints.
>

You have to update them all and track them, correct.  What you're looking
for sounds like transaction support, which Cassandra does not have.  On the
bright side, writes are cheap.

-Brandon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
Yes, I can update that one entry. But what if that subcolumn key is used
across many different places?

['Jones-Bob']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['Crabtree-Sam']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['Rice-Brown']['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
...

I can update every single entry but now I need to keep track of them (which
I guess I'm doing anyway). I was wondering if there was a more elegant
solution but it seems unlikely based on the given constraints.


On Fri, Mar 12, 2010 at 5:26 PM, Brandon Williams  wrote:

> On Fri, Mar 12, 2010 at 7:21 PM, Peter Chang  wrote:
>
>> My original post is probably confusing. I was originally talking about
>> columns and I don't see what the solution is.
>
>
> Sorry, I misunderstood.
>
> * "So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key [for the subcolumn, not a row key] *
>> *
>> *
>> *[user's lastname + user's firstname + user's uuid]*
>> * *
>> *This would result in sorted subcolumn and user list."*
>> *
>> *
>> Nevertheless, I still don't see/understand the solution. Let's say the
>> person's name changes. The sort is no longer valid. That column value would
>> need to be changed in order for the sort to be correct.
>>
>
> When their name changes, you delete the existing column and insert a new
> one with the correct name, which will then sort correctly.
>
> -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:21 PM, Peter Chang  wrote:

> My original post is probably confusing. I was originally talking about
> columns and I don't see what the solution is.


Sorry, I misunderstood.

* "So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key [for the subcolumn, not a row key] *
> *
> *
> *[user's lastname + user's firstname + user's uuid]*
> * *
> *This would result in sorted subcolumn and user list."*
> *
> *
> Nevertheless, I still don't see/understand the solution. Let's say the
> person's name changes. The sort is no longer valid. That column value would
> need to be changed in order for the sort to be correct.
>

When their name changes, you delete the existing column and insert a new one
with the correct name, which will then sort correctly.

-Brandon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
To be more explicit:

['500c9280-2cdd-11df-869b-005056c1'] ['connections']
['Hacker-Alyssa-1ab54760-2ca8-11df-aabd-005056c1']
['500c9280-2cdd-11df-869b-005056c1'] ['connections']
['Jones-Jim-1a6dd756b0-2ca1-11df-b937-005056c1']

But Alyssa gets married and changes her name to Zamboni. The next time I
read these subcolumns the user's will not be sorted.




On Fri, Mar 12, 2010 at 5:21 PM, Peter Chang  wrote:

> My original post is probably confusing. I was originally talking about
> columns and I don't see what the solution is.
>
> * "So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key [for the subcolumn, not a row key] *
> *
> *
> *[user's lastname + user's firstname + user's uuid]*
> * *
> *This would result in sorted subcolumn and user list."*
> *
> *
> Nevertheless, I still don't see/understand the solution. Let's say the
> person's name changes. The sort is no longer valid. That column value would
> need to be changed in order for the sort to be correct.
>
>
> On Fri, Mar 12, 2010 at 5:10 PM, Brandon Williams wrote:
>
>> On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:
>>
>>> But wouldn't name + UUID be considered volatile? That was the crux of my
>>> questions.
>>
>>
>> It would, but the distinction here is that it is now a column, not a row
>> key.
>>
>>  -Brandon
>>
>
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
My original post is probably confusing. I was originally talking about
columns and I don't see what the solution is.

* "So I was thinking I set the subcolumn compareWith to UTF8Type or
BytesType and construct a key [for the subcolumn, not a row key] *
*
*
*[user's lastname + user's firstname + user's uuid]*
* *
*This would result in sorted subcolumn and user list."*
*
*
Nevertheless, I still don't see/understand the solution. Let's say the
person's name changes. The sort is no longer valid. That column value would
need to be changed in order for the sort to be correct.


On Fri, Mar 12, 2010 at 5:10 PM, Brandon Williams  wrote:

> On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:
>
>> But wouldn't name + UUID be considered volatile? That was the crux of my
>> questions.
>
>
> It would, but the distinction here is that it is now a column, not a row
> key.
>
>  -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Fri, Mar 12, 2010 at 7:07 PM, Peter Chang  wrote:

> But wouldn't name + UUID be considered volatile? That was the crux of my
> questions.


It would, but the distinction here is that it is now a column, not a row
key.

-Brandon


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Peter Chang
But wouldn't name + UUID be considered volatile? That was the crux of my
questions.

On Fri, Mar 12, 2010 at 1:07 PM, Brandon Williams  wrote:

> On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang  wrote:
>
>> I'm wondering about good strategies for picking keys that I want to be
>> lexically sorted in a super column family. For example, my data looks like
>> this:
>>
>> [user1_uuid][connections][some_key_for_user2] = ""
>> [user1_uuid][connections][some_key_for_user3] = ""
>>
>> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
>> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key
>>
>> [user's lastname + user's firstname + user's uuid]
>>
>> This would result in sorted subcolumn and user list. That's fine. But I
>> wonder what would happen if, say, a user changes their last name. Happens
>> rarely but I imagine people getting married and modifying their name. Now
>> the sort is no longer correct. There seems to be some bad consequences to
>> creating keys based on data that can change.
>>
>> So what is the general (elegant, easy to maintain) strategy here? Always
>> sort in your server-side code and don't bother trying to have the data
>> sorted?
>>
>
> Having row keys based on something potentially volatile is something I
> would avoid since that determines which machine the row belongs to and
> moving data between machines isn't a cheap operation.
>
> What you'll probably want to do is make the key something unique (like a
> uuid), store the user's name as a column on the row (thus making it easy to
> update) and maintain a secondary index to get the named-based sorting you
> want.  If you're expecting a few million users, maintaining the index in a
> special row will work fine (eg, the row name is "NAMEINDEX" and the columns
> are the name+uuid similar to what you described.)  If you have billions of
> users, you'll need to get a bit fancier (partition based on letter of the
> last name, for example.)
>
> -Brandon
>


Re: Strategies for storing lexically ordered data in supercolumns

2010-03-12 Thread Brandon Williams
On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang  wrote:

> I'm wondering about good strategies for picking keys that I want to be
> lexically sorted in a super column family. For example, my data looks like
> this:
>
> [user1_uuid][connections][some_key_for_user2] = ""
> [user1_uuid][connections][some_key_for_user3] = ""
>
> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
> BytesType and construct a key
>
> [user's lastname + user's firstname + user's uuid]
>
> This would result in sorted subcolumn and user list. That's fine. But I
> wonder what would happen if, say, a user changes their last name. Happens
> rarely but I imagine people getting married and modifying their name. Now
> the sort is no longer correct. There seems to be some bad consequences to
> creating keys based on data that can change.
>
> So what is the general (elegant, easy to maintain) strategy here? Always
> sort in your server-side code and don't bother trying to have the data
> sorted?
>

Having row keys based on something potentially volatile is something I would
avoid since that determines which machine the row belongs to and moving data
between machines isn't a cheap operation.

What you'll probably want to do is make the key something unique (like a
uuid), store the user's name as a column on the row (thus making it easy to
update) and maintain a secondary index to get the named-based sorting you
want.  If you're expecting a few million users, maintaining the index in a
special row will work fine (eg, the row name is "NAMEINDEX" and the columns
are the name+uuid similar to what you described.)  If you have billions of
users, you'll need to get a bit fancier (partition based on letter of the
last name, for example.)

-Brandon