Row? What are you guys referring to as a row? no - this isnt a joke
Arin On Wed, Aug 12, 2009 at 9:39 PM, Evan Weaver<[email protected]> wrote: > PS. How's Avro these days? Or could we patch Thrift? Haven't looked at > the internals but assume they're scary. > > On Thu, Aug 13, 2009 at 12:23 AM, Evan Weaver<[email protected]> wrote: >> Incidentally, is there any specific reason the collation has to be >> pre-defined at the CF? What if any column could be an optional >> supercolumn with a collation set at runtime? Then all CFs would be the >> same. >> >> Evan >> >> On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<[email protected]> wrote: >>> If thrift were sane it would look something like >>> >>> struct Column { >>> byte[] name, >>> optional list<Column> subcolumns, >>> optional int64 timestamp, >>> optional byte[] value >>> } >>> >>> "you can either have the subcolumns, or the timestamp and value" seems >>> reasonable to me. >>> >>> of course in the real world, thrift can't do recursive structures, so >>> we'd have to go with Column/SubColumn like SuperColumn/Column today. >>> So... maybe not really an improvement after all. :) >>> >>> (Why am I not surprised to find out that protocol buffers does support >>> this? Sigh.) >>> >>> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<[email protected]> wrote: >>>> Hmm, my Ruby client internally refers to columns and subcolumns, >>>> rather than supercolumns and columns...mainly because the subcolumn >>>> position is optional, but the column_or_supercolumn position is not. >>>> So there is something we agree on. >>>> >>>> Do you think the lack of a timestamp in the supercolumn is confusing? >>>> It's still not exactly a kind of column. >>>> >>>> Evan >>>> >>>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<[email protected]> wrote: >>>>> I agree with the proposition that the SuperColumn name is weak. >>>>> (Although not, as I mentioned, Column or ColumnFamily.) And I could >>>>> go with schema over keyspace. >>>>> >>>>> One option to deal with SC would be to excise the term SC (and SCF >>>>> from the config) and instead just have Columns, which may or may not >>>>> have SubColumns. You would define this as >>>>> >>>>> <ColumnFamily withSubColumns="true" .../> >>>>> >>>>> "Insert a subcolumn named A into the Column named B" fits pretty well >>>>> with how I think of things working. And now you just have Rows and >>>>> Columns! Just like a RDB! :P >>>>> >>>>> -Jonathan >>>>> >>>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<[email protected]> wrote: >>>>>> Points taken, and I agree, except in my experience the current names >>>>>> are not Pretty Good but rather Pretty Weird; the primary issues being >>>>>> column family and super column. >>>>>> >>>>>> If we go by the shorter-is-better principle, we might get: >>>>>> >>>>>> Cluster >>>>>> Schema >>>>>> Row set >>>>>> Row w/key >>>>>> Field set >>>>>> Field >>>>>> >>>>>> "You take the user's key, and use that to insert into the Row Set >>>>>> 'user_associations' at Field Set 'user_timeline,' a field named with a >>>>>> time-based UUID representing now, and with a value of the new tweet's >>>>>> key." >>>>>> >>>>>> But let me study for a while and come up with a more researched proposal. >>>>>> >>>>>> Evan >>>>>> >>>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<[email protected]> wrote: >>>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael >>>>>>> Koziarski<[email protected]> wrote: >>>>>>>> However I think it's worth considering this from a strategic >>>>>>>> perspective, looking at how we want the project do grow and change, >>>>>>>> rather than just as it is right now. The key to successful adoption >>>>>>>> is having a successful elevator pitch, you can start using a database >>>>>>>> without understanding relational-algebra because 'table' and 'column' >>>>>>>> are such simple ways to reason about the tool. As it stands >>>>>>>> cassandra's takes a whiteboard and 15 minutes, before people get what >>>>>>>> you're talking about. >>>>>>> >>>>>>> If you want to explain it as "sort of like a relational db" then >>>>>>> >>>>>>> table -> CF >>>>>>> column -> column >>>>>>> key -> key >>>>>>> row -> row >>>>>>> >>>>>>> That's the simple case, then all you have is "supercolumns can contain >>>>>>> a list of simple columns." >>>>>>> >>>>>>> That really doesn't seem so hard to me. I have explained this to >>>>>>> *managers*. >>>>>>> >>>>>>>> Assuming the project gets anything like the adoption it deserves, the >>>>>>>> users we have today will be a *tiny minority* of the users we have in >>>>>>>> the future. So imposing costs on the current userbase which will give >>>>>>>> huge benefits to future users, should be something we're willing to >>>>>>>> do. In fact it's something that has been done repeatedly over the >>>>>>>> last few weeks. >>>>>>> >>>>>>> I agree. But as I said before I just don't see this as being an >>>>>>> improvement. >>>>>>> >>>>>>>> Given those changes went in without debate, I'm not sure what the >>>>>>>> reluctance is for making changes to the nomenclature for the project. >>>>>>> >>>>>>> As above. >>>>>>> >>>>>>>> Speaking as someone who's only been doing this a month, the naming is >>>>>>>> *still* confusing, and when I talk with people who wonder what >>>>>>>> cassandra is all about I get blank looks when telling them what things >>>>>>>> are called. If you step back and want to tell someone how you'd >>>>>>>> insert a tweet into someone's timeline using evan's weblog post: >>>>>>>> >>>>>>>> "You just take the user's key, and use that to insert into the >>>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a >>>>>>>> ColumnName of a time based uuid representing now, and a value of the >>>>>>>> new tweet's key" >>>>>>>> >>>>>>>> Column is in the name of 3 of the 5 concepts expressed, and in each >>>>>>>> cases it's different. >>>>>>> >>>>>>> When you're inserting something nested 3 levels deep a certain amount >>>>>>> of verbosity is unavoidable. With Evan's nomenclature, >>>>>>> >>>>>>> "You take the user's record ID, and use that to insert into the Record >>>>>>> Collection 'user associations' at Attribute Collection >>>>>>> 'user_timeline,' an Attribute named with a time based uuid >>>>>>> representing now, and with a value of the new tweet's key." >>>>>>> >>>>>>> I think that is a negative improvement. Yay, now we are talking about >>>>>>> Attribute Collections and Attributes instead of SuperColumns and >>>>>>> Columns. The same objections ("one object's name contains the >>>>>>> other's!) apply, plus the new one of sounding so generic that it could >>>>>>> apply to practically any system. >>>>>>> >>>>>>> -Jonathan >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Evan Weaver >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Evan Weaver >>>> >>> >> >> >> >> -- >> Evan Weaver >> > > > > -- > Evan Weaver >
