Mark I can work on that with you. We should do this regardless of naming changes etc. I'll even volunteer to do a PHP app based on the data model we mock up.
if you wanna coordinate some work on this you can reach me at: email: [email protected] (or [email protected]) IM/Twitter/IRC/just_about_everything_online: phatduckk - Arin On Tue, Aug 11, 2009 at 10:36 PM, Mark McBride<[email protected]> wrote: > It seems to me that what would be most helpful, regardless of changes, > is having a document that describes the data model in more detail than > the current data model wiki page. I can take a stab at creating a new > page that includes examples if that would be useful. > > On Tue, Aug 11, 2009 at 10:34 PM, Arin Sarkissian<[email protected]> wrote: >> I agree that the names are pretty horrible for a newbie... >> >> I'll echo the concerns that the RDBMS vernacular messes with a >> newcomer's head. I feel like the words "Row" and "Column" are way too >> loaded since most people have an RDBMS background... BUT >> >> In the BigTable paper we've got the term "Column Family". This term is >> also used in HBase and Hypertable. Since the term's out there in the >> wild I wouldn't feel comfortable ditching it and making something up >> to fill its spot. That would lead to a scenario where folks with >> experience with Hbase, Hypertable and Bigtable get confused (or think >> the naming is dumb) but would lesson the confusion for RDBMS peeps. >> Doesn't sound like the right tradeoff: 4 sets of folks have something >> new to digest instead of 1. >> >> The "bad" terms are "column" and "row". That's where the real issues >> arise... but given the fact that I believe we should keep "column >> family" i have no idea what we'd call the things inside the CF? It >> would be odd as hell to have a CF contain "records" etc. Does that >> mean we should keep it called "column"? IMO w/o an awesome >> alternative, yes. >> >> The word "row" should go away tho... >> When I first started using cassandra I thought that: a key pointed to >> a row and that row had one of each column family. This isn't the case >> but the RDBMS terms + SQL-ish thinking caused me and many other to >> assume as much. Took us a while to figure that out... >> >> But realistically how much of this confusion could be avoided with a >> legit example? Once you see a good example you start getting it. A lot >> of people have been pointed towards the ThriftIterface page on the >> wiki which clears up next to nothing: >> http://wiki.apache.org/cassandra/ThriftInterface . There's stuff like >> "edges", "base_attributes" etc. It's next door to nonsensical.. >> >> What if we had a real example that people could relate to... a model a >> blog or something along those lines & update the >> http://wiki.apache.org/cassandra/ThriftInterface page to show how each >> on the API methods would be used to accomplish basic tasks... ex: get >> all comments for a blog entry, list entires in time order, list >> entries tagged "bar", find all entries with "foo" in the body (kinda >> like the Facebook mail search example). >> >> -Arin >> >> >> >> On Tue, Aug 11, 2009 at 10:09 PM, Curt Micol<[email protected]> wrote: >>> Hello, >>> >>> I am hardly a developer, so this isn't directly addressed to me, but >>> if I may comment on a couple of things from an outsider's >>> (non-developer, new to this scale of database) perspective. >>> >>> On Wed, Aug 12, 2009 at 12:38 AM, Eric Evans<[email protected]> wrote: >>>> On Tue, 2009-08-11 at 10:37 -0700, Evan Weaver wrote: >>>>> In my experience, the naming of the data model has been a huge barrier >>>>> to entry for users of Cassandra. This goes both for people familiar >>>>> with SQL, and for people familiar with BigTable. I would like to >>>>> change this before 0.4, since the 0.3 to 0.4 transition is the Great >>>>> API Breakening. >>> >>> I agree that there is a barrier, specifically because most people have >>> no experience with this type of data structure and as you mention are >>> coming from SQL. Clearer names along with more documentation/examples >>> will help grow the user base of Cassandra quite a bit. >>> >>>>> So technically this is not a bikeshed, because I'm happy to do all the >>>>> work. I'll even submit a patch for Digg's Python client. Since there >>>>> are no production deployments of ASF, and only a couple >>>>> well-maintained clients, now is the time to break the world. A few >>>>> hours of work now will pay off richly in terms of community >>>>> involvement and reduced noob-explanation-time. >>> >>> I would offer my services here also if a change were accepted. >>> >>> And while I don't know what the exact names should be (nor am I >>> qualified tbh), I think they should be clearer than they are. At this >>> point they seem to be a mixture of RDBMS and Document DB terms. The >>> change to 'keyspace' from 'table' I think was a first step in this >>> process, but it should be taken further and all names normalized >>> across the board to properly represent their relationship with each >>> other. At least that's my very humble opinion. >>> >>> In response to Mr. Evan's comment regarding the Bigtable paper, does >>> the Cassandra community want this to be a requirement for using the >>> software? I would think not. Sure, most early adopters are coming >>> from that paper, but it shouldn't be a source of entry to use the >>> database, but rather to develop it. >>> >>> Again, my opinion carries little weight, but +1 from this user. >>> >>> Thanks for everyone's hard work, I am really excited to see how this >>> project continues to progress. >>> >>> -- >>> # Curt Micol >>> >> >
