I've been thinking about this for a number of days, and again, while I am not a developer I thought I might toss in a proposal if that's okay.
Since putting together a schema diagram and having a number of people review it, I think a change is warranted. Too many people are coming from the RDBMS world and the terms used by Cassandra are conflicting with those terms they are already familiar with. The TLDR version is as follows: Object (Column) ObjectFamily (ColumnFamily) Directory (Row) ObjectContainer (SuperColumn) Namespace (Keyspace) The long version... Object (Column) As Evan has stated repeatedly, column is a bit misleading especially when compared to other types of database systems. I think this is probably the most important change to the data model names, and exactly where I started since this is the 'core' of Cassandra. Object gives the impression that this is a piece of data, it's relatively structured but the name gives no impression how strict that structure is. 'Objects' have names that have values and timestamps. Simple and too the point. 'Object' doesn't come with the preconceived notions that 'column' comes with and leaves room for Cassandra to define what an 'object' is without any conflict to preexisting data structures. By changing this, we can move up the ladder to other data types and easily rename them to something that 'contains objects' or 'accesses objects'. This allows us to describe the data model in the name structure without having to get too deep into the definition. Directory (Row) 'row' is currently unnamed, but still a structure that exists in the model. It's not specifically data itself, but more of a mapping of how to get to objects (using keys). 'Directory' fills this void quite well. It is easily explained as a path to get to data and not data itself. ObjectFamily (ColumnFamily) There's no argument that the one direct link to the BigTable paper is 'column families'. It's perhaps the only structure that is virtually the same in both pieces of software. Considering this, I think we need to avoid too drastic a change. With that said, I think a change is necessary due to the differences in columns between the two databases. 'object family' is descriptive of the relation between objects and removes any reference to tabular structures while keeping a loose relationship to 'column family' in the BigTable paper. ObjectContainer (SuperColumn) I could see this being shortened to 'container' in every day conversation. However, 'objectcontainer' fits nicely with the rest of the data model names and is descriptive of it's purpose and use. Ultimately a 'supercolumn' is nothing more than a named container of columns (and I've seen on at least 3 different occasions the word container used to describe supercolumns). 'supercolumn' had no real connection to what exactly it was defining, but with 'object container' we have a clear understanding that we are naming the structure that holds objects. Or as I explained it to a friend, we are naming the 'jar' and not the 'honey'. :) Namespace (Keyspace) This one I go back and forth on. I know it's been changed from 'Table' to 'keyspace' and Evan proposed 'database', but I think that 'namespace' is really what it is we are talking about. Wikipedia has this as the first line to describe 'namespace': A namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols (i.e., names). Originally I thought 'objectspace' would fit better, but I think 'namespace' comes with a better history and is clearer to what this structure really is. Especially when you relate the name namespace to how it is used in Ruby, Python and Java. Ultimately though, I think I prefer 'keyspace' over 'table' or 'database'. The only issue I see with all of these names is the potential conflict with programming languages and their objects. I know next to nothing about Java so I don't know if there would be a conflict here. I've ran the following Google search 'reserved words in *' where '*' is Ruby, Python, Java and C++ and received no mention of 'object' being a reserved word in any of those languages. I also grep'd through current source code and there doesn't seem to be any real conflicts that couldn't be named something else so as not to conflict with this naming structure. In the end, I think it's a good idea to look at this and work out a solution. Documentation and tutorials are going to help, but I think people are so entrenched in the RDBMS world that there is somewhat of a barrier to understanding Cassandra's data model. Thanks for your time, -- # Curt Micol