A row is the data associated with a key in a given CF.
On Thu, Aug 13, 2009 at 12:17 AM, Arin Sarkissian<[email protected]> wrote: > Row? What are you guys referring to as a row? > > no - this isnt a joke > > Arin > > On Wed, Aug 12, 2009 at 9:39 PM, Evan Weaver<[email protected]> wrote: >> PS. How's Avro these days? Or could we patch Thrift? Haven't looked at >> the internals but assume they're scary. >> >> On Thu, Aug 13, 2009 at 12:23 AM, Evan Weaver<[email protected]> wrote: >>> Incidentally, is there any specific reason the collation has to be >>> pre-defined at the CF? What if any column could be an optional >>> supercolumn with a collation set at runtime? Then all CFs would be the >>> same. >>> >>> Evan >>> >>> On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<[email protected]> wrote: >>>> If thrift were sane it would look something like >>>> >>>> struct Column { >>>> byte[] name, >>>> optional list<Column> subcolumns, >>>> optional int64 timestamp, >>>> optional byte[] value >>>> } >>>> >>>> "you can either have the subcolumns, or the timestamp and value" seems >>>> reasonable to me. >>>> >>>> of course in the real world, thrift can't do recursive structures, so >>>> we'd have to go with Column/SubColumn like SuperColumn/Column today. >>>> So... maybe not really an improvement after all. :) >>>> >>>> (Why am I not surprised to find out that protocol buffers does support >>>> this? Sigh.) >>>> >>>> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<[email protected]> wrote: >>>>> Hmm, my Ruby client internally refers to columns and subcolumns, >>>>> rather than supercolumns and columns...mainly because the subcolumn >>>>> position is optional, but the column_or_supercolumn position is not. >>>>> So there is something we agree on. >>>>> >>>>> Do you think the lack of a timestamp in the supercolumn is confusing? >>>>> It's still not exactly a kind of column. >>>>> >>>>> Evan >>>>> >>>>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<[email protected]> wrote: >>>>>> I agree with the proposition that the SuperColumn name is weak. >>>>>> (Although not, as I mentioned, Column or ColumnFamily.) And I could >>>>>> go with schema over keyspace. >>>>>> >>>>>> One option to deal with SC would be to excise the term SC (and SCF >>>>>> from the config) and instead just have Columns, which may or may not >>>>>> have SubColumns. You would define this as >>>>>> >>>>>> <ColumnFamily withSubColumns="true" .../> >>>>>> >>>>>> "Insert a subcolumn named A into the Column named B" fits pretty well >>>>>> with how I think of things working. And now you just have Rows and >>>>>> Columns! Just like a RDB! :P >>>>>> >>>>>> -Jonathan >>>>>> >>>>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<[email protected]> wrote: >>>>>>> Points taken, and I agree, except in my experience the current names >>>>>>> are not Pretty Good but rather Pretty Weird; the primary issues being >>>>>>> column family and super column. >>>>>>> >>>>>>> If we go by the shorter-is-better principle, we might get: >>>>>>> >>>>>>> Cluster >>>>>>> Schema >>>>>>> Row set >>>>>>> Row w/key >>>>>>> Field set >>>>>>> Field >>>>>>> >>>>>>> "You take the user's key, and use that to insert into the Row Set >>>>>>> 'user_associations' at Field Set 'user_timeline,' a field named with a >>>>>>> time-based UUID representing now, and with a value of the new tweet's >>>>>>> key." >>>>>>> >>>>>>> But let me study for a while and come up with a more researched >>>>>>> proposal. >>>>>>> >>>>>>> Evan >>>>>>> >>>>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<[email protected]> >>>>>>> wrote: >>>>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael >>>>>>>> Koziarski<[email protected]> wrote: >>>>>>>>> However I think it's worth considering this from a strategic >>>>>>>>> perspective, looking at how we want the project do grow and change, >>>>>>>>> rather than just as it is right now. The key to successful adoption >>>>>>>>> is having a successful elevator pitch, you can start using a database >>>>>>>>> without understanding relational-algebra because 'table' and 'column' >>>>>>>>> are such simple ways to reason about the tool. As it stands >>>>>>>>> cassandra's takes a whiteboard and 15 minutes, before people get what >>>>>>>>> you're talking about. >>>>>>>> >>>>>>>> If you want to explain it as "sort of like a relational db" then >>>>>>>> >>>>>>>> table -> CF >>>>>>>> column -> column >>>>>>>> key -> key >>>>>>>> row -> row >>>>>>>> >>>>>>>> That's the simple case, then all you have is "supercolumns can contain >>>>>>>> a list of simple columns." >>>>>>>> >>>>>>>> That really doesn't seem so hard to me. I have explained this to >>>>>>>> *managers*. >>>>>>>> >>>>>>>>> Assuming the project gets anything like the adoption it deserves, the >>>>>>>>> users we have today will be a *tiny minority* of the users we have in >>>>>>>>> the future. So imposing costs on the current userbase which will give >>>>>>>>> huge benefits to future users, should be something we're willing to >>>>>>>>> do. In fact it's something that has been done repeatedly over the >>>>>>>>> last few weeks. >>>>>>>> >>>>>>>> I agree. But as I said before I just don't see this as being an >>>>>>>> improvement. >>>>>>>> >>>>>>>>> Given those changes went in without debate, I'm not sure what the >>>>>>>>> reluctance is for making changes to the nomenclature for the project. >>>>>>>> >>>>>>>> As above. >>>>>>>> >>>>>>>>> Speaking as someone who's only been doing this a month, the naming is >>>>>>>>> *still* confusing, and when I talk with people who wonder what >>>>>>>>> cassandra is all about I get blank looks when telling them what things >>>>>>>>> are called. If you step back and want to tell someone how you'd >>>>>>>>> insert a tweet into someone's timeline using evan's weblog post: >>>>>>>>> >>>>>>>>> "You just take the user's key, and use that to insert into the >>>>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a >>>>>>>>> ColumnName of a time based uuid representing now, and a value of the >>>>>>>>> new tweet's key" >>>>>>>>> >>>>>>>>> Column is in the name of 3 of the 5 concepts expressed, and in each >>>>>>>>> cases it's different. >>>>>>>> >>>>>>>> When you're inserting something nested 3 levels deep a certain amount >>>>>>>> of verbosity is unavoidable. With Evan's nomenclature, >>>>>>>> >>>>>>>> "You take the user's record ID, and use that to insert into the Record >>>>>>>> Collection 'user associations' at Attribute Collection >>>>>>>> 'user_timeline,' an Attribute named with a time based uuid >>>>>>>> representing now, and with a value of the new tweet's key." >>>>>>>> >>>>>>>> I think that is a negative improvement. Yay, now we are talking about >>>>>>>> Attribute Collections and Attributes instead of SuperColumns and >>>>>>>> Columns. The same objections ("one object's name contains the >>>>>>>> other's!) apply, plus the new one of sounding so generic that it could >>>>>>>> apply to practically any system. >>>>>>>> >>>>>>>> -Jonathan >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Evan Weaver >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Evan Weaver >>>>> >>>> >>> >>> >>> >>> -- >>> Evan Weaver >>> >> >> >> >> -- >> Evan Weaver >> >
