[Cassandra Wiki] Update of "DataModelv2" by StaffanEric sson

Apache Wiki Tue, 09 Mar 2010 07:34:26 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "DataModelv2" page has been changed by StaffanEricsson.
http://wiki.apache.org/cassandra/DataModelv2?action=diff&rev1=9&rev2=10

--------------------------------------------------

- ## page was copied from DataModel
  = Introduction =
  
  Cassandra has a data model that can most easily be thought of as a four or 
five dimensional hash.
  
  The basic concepts are:
-  * Cluster: the machines (nodes) in a logical Cassandra instance.  Clusters 
can contain multiple keyspaces.
+  * Cluster- a number of nodes (servers) in a logical Cassandra instance.  
Clusters can contain multiple keyspaces.
   * Keyspace: a namespace for !ColumnFamilies, typically one per application.
   * !ColumnFamilies contain multiple columns, each of which has a name, value, 
and a timestamp, and which are referenced by row keys.
   * !SuperColumns can be thought of as columns that themselves have subcolumns.
@@ -34, +33 @@

  }
  }}}
  
- All values are supplied by the client, including the 'timestamp'.  This means 
that clocks on the clients should be synchronized (in the Cassandra server 
environment is useful also), as these timestamps are used for conflict 
resolution.  In many cases the 'timestamp' is not used in client applications, 
and it becomes convenient to think of a column as a name/value pair. For the 
remainder of this document, 'timestamps' will be elided for readability.  It is 
also worth noting the name and value are binary values, although in many 
applications they are UTF8 serialized strings.
+ All values are supplied by the client, including the 'timestamp'.  This means 
that clocks on the clients should be synchronized (in the Cassandra server 
environment is useful also), as these timestamps are used for conflict 
resolution.  In many cases the 'timestamp' is not used in client applications, 
and it becomes convenient to think of a column as a name/value pair. For the 
remainder of this document, 'timestamps' will be mostly elided for readability. 
 It is also worth noting the name and value are binary values, although in many 
applications they are UTF8 serialized strings.
  
- Timestamps can be anything you like, but milliseconds since 1970 is a 
convention, as returned by System.getTimeMillis() in Java. Whatever you use, it 
must be consistent across the application otherwise earlier changes may 
overwrite newer ones.
+ Timestamps can be any number you like, but milliseconds since 1970 is a 
convention, as returned by System.getTimeMillis() in Java. Whatever you use, it 
must be consistent across the application otherwise earlier changes may 
overwrite newer ones.
  
  = Column Families =
  
- A column family is a container for columns, analogous to the table in a 
relational system.  You define column families in your storage-conf.xml file, 
and cannot modify them (or add new column families) without restarting your 
Cassandra process.  A column family holds an ordered list of columns, which you 
can reference by the column name.
+ A column family is a container for columns.  You define column families in 
your storage-conf.xml file, and cannot modify them (or add new column families) 
without restarting your Cassandra process.  A column family holds an ordered 
list of columns, which you can reference by the column name.
  
  Column families have a configurable ordering applied to the columns within 
each row, which affects the behavior of the get_slice call in the thrift API.  
Out of the box ordering implementations include ASCII, UTF-8, Long, and UUID 
(lexical or time).
  
@@ -53, +52 @@

  A JSON representation of the key -> column families -> column structure is
  {{{
  {
-    "mccv":{
+    "mccv":{  //Key
-       "Users":{
+       "Users":{ //Column Familiy
-          "emailAddress":{"name":"emailAddress", "value":"[email protected]"},
+          "emailAddress":{"name":"emailAddress", "value":"[email protected]", 
"timestamp":"1234567890"}, //Column
-          "webSite":{"name":"webSite", "value":"http://bar.com"}
+          "webSite":{"name":"webSite", "value":"http://bar.com";, 
"timestamp":"1234567890"}  //Column
        },
-       "Stats":{
-          "visits":{"name":"visits", "value":"243"}
+       "Stats":{ //Column Family
+          "visits":{"name":"visits", "value":"243", "timestamp":"1234567890"}  
//Column
        }
     },
-    "user2":{
-       "Users":{
+    "matt":{ //Key
+       "Users":{ //Column Familiy
-          "emailAddress":{"name":"emailAddress", "value":"[email protected]"},
+          "emailAddress":{"name":"emailAddress", "value":"[email protected]", 
"timestamp":"1234567890"},  //Column
-          "twitter":{"name":"twitter", "value":"user2"}
+          "twitter":{"name":"twitter", "value":"user2", 
"timestamp":"1234567890"} //Column
        }
     }
  }
  }}}
  
- Note that the key "mccv" identifies data in two different column families, 
"Users" and "Stats". This does not imply that data from these column families 
is related.  The semantics of having data for the same key in two different 
column families is entirely up to the application.  Also note that within the 
"Users" column family, "mccv" and "user2" have different column names defined.  
This is perfectly valid in Cassandra.  In fact there may be a virtually 
unlimited set of column names defined, which leads to fairly common use of the 
column name as a piece of runtime populated data.  This is unusual in storage 
systems, particularly if you're coming from the RDBMS world.
+ Note that the key "mccv" identifies data in two different column families, 
"Users" and "Stats". This does not imply that data from these column families 
is related.  The semantics of having data for the same key in two different 
column families is entirely up to the application.  Also note that within the 
"Users" column family, "mccv" and "matt" have different column names defined.  
This is perfectly valid in Cassandra.  In fact there may be a virtually 
unlimited set of column names defined, which leads to fairly common use of the 
column name as a piece of runtime populated data.  This is unusual in storage 
systems, particularly if you're coming from the RDBMS world.
  
  = Keyspaces =
  
@@ -86, +85 @@

  A JSON description of this layout:
  {{{
  {
-   "mccv": {
+   "mccv": { //Key
-     "Tags": {
-       "cassandra": {
+     "Tags": { //column family 
+       "cassandra": { //SuperColumn
-         "incubator": {"incubator": "http://incubator.apache.org/cassandra/"},
+         "incubator": {"incubator": "http://incubator.apache.org/cassandra/";, 
"timestamp":"1234567890"}, //Column
-         "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA"}
+         "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA";, 
"timestamp":"1234567890"} //Column
        },
-       "thrift": {
+       "thrift": { //SuperColumn
-         "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT"}
+         "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT";, 
"timestamp":"1234567890"} //Column
        }
      }  
    }

[Cassandra Wiki] Update of "DataModelv2" by StaffanEric sson

Reply via email to