Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "DataModelv2" page has been changed by ronaldmathies.
http://wiki.apache.org/cassandra/DataModelv2?action=diff&rev1=17&rev2=18

--------------------------------------------------

  
  = Introduction =
  
- Cassandra has a data model that far different from normal relational 
databases, instead of having schemas, tables and column the data
+ Cassandra has a data model that is far different from normal relational 
databases, instead of having schemas, tables and column the data
  model consists of a structure of lists and maps.
  
  When we start to look from the highest level we have clusters, clusters are 
physical machines operating together and forming a logical 
+ Cassandra instance. A cluster can contain several keyspaces. A keyspace is a 
group consisting of various ColumnFamilies, in general an application uses a 
single Keyspace. !ColumnFamilies consists of rows which in turn consists of 
multiple values (Columns) per row.
- Cassandra instance. A cluster can contain several keyspaces. A keyspace is 
very similar to a relation database schema and contains
- a number of !ColumnFamilies. !ColumnFamilies can be compared to a table in a 
relation database. And a !ColumnFamily contains Columns.
  
- A !ColumnFamily comes in two flavors, the first one we already described, 
which is a !ColumnFamily which has columns. The second one
+ A !ColumnFamily comes in two flavors, the first one we already described, 
which is a !ColumnFamily which has columns. The second !ColumnFamily
- is also called a !SuperColumnFamily, this one contains !SuperColumns where 
the !SuperColumns contain a list of Columns. If it becomes 
- confusing just read on, it will become clearer.
+ contains !SuperColumns where the !SuperColumns contain a list of Columns. If 
it becomes confusing just read on, it will become clearer.
+ 
+ So to recap, a !ColumnFamily can contain either a list of Columns or a list 
of !SuperColumns.
  
  We'll start from the bottom up, moving from the leaves of Cassandra's data 
structure (columns) up to the root of the tree (the cluster).
  
  {{{
+ -- REMARK: ---
  From my experience, comparing concepts to those in a relational database 
pretty consistently confuses people. This section intermixes the "structure of 
lists and maps" approach with relational db comparisons "ColumnFamilies can be 
compared to a table in a relational database.", which is probably worse still.
  
- I also think it's best to avoid referring to column families as containers.
+ I also think it's best to avoid referring to column families as containers.''
+ 
+ -- SOLUTION: ---
+ Changed the above description, there is no reference to a relation database 
anymore
  }}}
  
  == Columns ==
  
- A Column is also known as a Tuple (triplet), it contains a name, value and a 
timestamp.
+ A Column consists of a name, value and a timestamp.
  
  {{{
+ --- REMARK: ---
  This wording suggests that Tuple is a synonym for Column (which is not true).
+ 
+ --- SOLUTION: ---
+ Removed the synonym
  }}}
  
  All values are supplied by the client, including the 'timestamp'. This means 
that clocks on the clients should be synchronized (in the Cassandra server 
environment is useful also), as these timestamps are used for conflict 
resolution. In many cases the 'timestamp' is not used in client applications, 
and it becomes convenient to think of a column as a name/value pair. For the 
remainder of this document, 'timestamps' will be elided for readability. It is 
also worth noting the name and value are binary values, although in many 
applications they are UTF8 serialized strings.
@@ -69, +77 @@

    </Keyspace>
  </Keyspaces>
  
- In Cassandra, each column family is stored in a separate file, and the file 
is sorted in row (i.e. key) major order. Related columns, those that you'll 
access together, should be kept within the same column family.
+ In Cassandra, each column family is sorted in row (i.e. key) major order. 
Related columns, those that you'll access together, should be kept within the 
same column family.
  
  {{{
+ --- REMARK: ---
  IMO, you should avoid implementation details unless they are really relevant, 
as it distracts, (i.e. "each column family is stored in a separate file").
+ 
+ --- SOLUTION: ---
+ Nice remark, this should indeed be covered in a separate topic about the 
storage itself.
  }}}
  
  The row key is what determines what machine data is stored on. A key can be 
used for several column families at the same time, this does however not imply 
that the data from these column families is related. The semantics of having 
data for the same key in two different column families is entirely up to the 
client. Also, the columns can be different between the two column families. In 
fact there may be a virtually unlimited set of column names defined, which 
leads to fairly common use of the column name as a piece of runtime populated 
data. This is unusual in storage systems, particularly if you're coming from 
the relational database world. For each key you can have data from multiple 
column families associated with it. However, these are logically distinct, 
which is why the Thrift interface is oriented around accessing one 
!ColumnFamily per key at a time. On the other hand, a number of methods within 
the Thrift interface make use of this functionality, for example the 
batch_insert and batch_mutate make it possible to insert or modify data in 
multiple !ColumnFamilies at the same time, as long as the key for the different 
column families are the same. 
@@ -122, +134 @@

  ||           || "lastname"  || "Steward"    || 1270084021      ||
  ||           || "birthday"  || "01/01/1982" || 1270084021      ||
  
- As you can see it looks the same as a !ColumnFamily, the only difference is 
the usage, a !SuperColumn is used within a !SuperColumnFamily, so
+ As you can see it looks the same as a !ColumnFamily, the only difference is 
the usage, a !SuperColumn is used within a !ColumnFamily, so
  it adds an extra layer in your data structure, instead of having only a row 
which consists of a key and a list of columns we can now have a row
  which consists of a key and a list of super columns which by itself has keys 
and per key a list of columns.
  
- == SuperColumnFamily ==
+ == !ColumnFamily containing !SuperColumns ==
  
- The !SuperColumnFamily isn't much different from a normal !ColumnFamily 
except that it contains a list of super columns per row instead of
+ A !ColumnFamily which contains !SuperColumns isn't that much different from a 
!ColumnFamily containing Columns, instead of having a row consisting of Columns 
we have rows consisting of !SuperColumns.
+ 
- a list of columns. To following example defines a super column family in your 
storage-conf.xml:
+ The following example defines a super column family in your storage-conf.xml:
  
  {{{
+ -- REMARK: ---
  IMO, the term "SuperColumnFamily" should die.
+ 
+ -- SOLUTION: ---
+ And it's dead, i've removed it everywhere and rephrased the sentences to make 
it clear. 
  }}}
  
  An example configuration of an Authors !ColumnFamily using the UTF-8 sorting 
implementation would be:
@@ -143, +160 @@

    </Keyspace>
  </Keyspaces> 
  
- The !ColumnType tells cassandra that the Posts columns family is a super 
column family, the !CompareSubcolumnsWith attribute defines the sorting 
behavior of the keys of the super columns.
+ The !ColumnType tells Cassandra that the Posts columns family is a 
!ColumnFamily containing !SuperColumns, the !CompareSubcolumnsWith attribute 
defines the sorting behavior of the keys of the super columns.
  
  Model representation:
  
- ||<-2> '''!SuperColumnFamily'''      ||
+ ||<-2> '''!ColumnFamily'''      ||
  || '''key''' || '''list'''           ||
  || binary    || 1 .. * !SuperColumns ||
  
  Data representation:
  
- ||<-5> '''!SuperColumnFamily'''                                               
        ||
+ ||<-5> '''!ColumnFamily'''                                                    
   ||
  || '''Key'''        ||<-4> '''!SuperColumns'''                                
        ||
  || "my-new-guitar"  || '''key''' ||<-3> '''Columns'''                         
        ||
  ||                  || post      || '''name''' || '''value'''      || 
'''timestamp''' ||
@@ -177, +194 @@

  
  == Keyspaces ==
  
- A keyspace is the first dimension of the Cassandra hash, and is the container 
for column families. Keyspaces are of roughly the same granularity as a schema 
or database (i.e. a logical collection of tables) in the RDBMS world. They are 
the configuration and management point for column families, and is also the 
structure on which batch inserts are applied. In most cases you will have one 
keyspace for an application.
+ A keyspace is the first dimension of the Cassandra hash, and is the container 
for the !ColumnFamilies. Keyspaces are of roughly the same granularity as a 
schema or database (i.e. a logical collection of tables) in the RDBMS world. 
They are the configuration and management point for column families, and is 
also the structure on which batch inserts are applied. In most cases you will 
have one Keyspace for an application.
  
  == Modeling your application ==
  

Reply via email to