Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "DataModelv2" page has been changed by StaffanEricsson. http://wiki.apache.org/cassandra/DataModelv2?action=diff&rev1=9&rev2=10 -------------------------------------------------- - ## page was copied from DataModel = Introduction = Cassandra has a data model that can most easily be thought of as a four or five dimensional hash. The basic concepts are: - * Cluster: the machines (nodes) in a logical Cassandra instance. Clusters can contain multiple keyspaces. + * Cluster- a number of nodes (servers) in a logical Cassandra instance. Clusters can contain multiple keyspaces. * Keyspace: a namespace for !ColumnFamilies, typically one per application. * !ColumnFamilies contain multiple columns, each of which has a name, value, and a timestamp, and which are referenced by row keys. * !SuperColumns can be thought of as columns that themselves have subcolumns. @@ -34, +33 @@ } }}} - All values are supplied by the client, including the 'timestamp'. This means that clocks on the clients should be synchronized (in the Cassandra server environment is useful also), as these timestamps are used for conflict resolution. In many cases the 'timestamp' is not used in client applications, and it becomes convenient to think of a column as a name/value pair. For the remainder of this document, 'timestamps' will be elided for readability. It is also worth noting the name and value are binary values, although in many applications they are UTF8 serialized strings. + All values are supplied by the client, including the 'timestamp'. This means that clocks on the clients should be synchronized (in the Cassandra server environment is useful also), as these timestamps are used for conflict resolution. In many cases the 'timestamp' is not used in client applications, and it becomes convenient to think of a column as a name/value pair. For the remainder of this document, 'timestamps' will be mostly elided for readability. It is also worth noting the name and value are binary values, although in many applications they are UTF8 serialized strings. - Timestamps can be anything you like, but milliseconds since 1970 is a convention, as returned by System.getTimeMillis() in Java. Whatever you use, it must be consistent across the application otherwise earlier changes may overwrite newer ones. + Timestamps can be any number you like, but milliseconds since 1970 is a convention, as returned by System.getTimeMillis() in Java. Whatever you use, it must be consistent across the application otherwise earlier changes may overwrite newer ones. = Column Families = - A column family is a container for columns, analogous to the table in a relational system. You define column families in your storage-conf.xml file, and cannot modify them (or add new column families) without restarting your Cassandra process. A column family holds an ordered list of columns, which you can reference by the column name. + A column family is a container for columns. You define column families in your storage-conf.xml file, and cannot modify them (or add new column families) without restarting your Cassandra process. A column family holds an ordered list of columns, which you can reference by the column name. Column families have a configurable ordering applied to the columns within each row, which affects the behavior of the get_slice call in the thrift API. Out of the box ordering implementations include ASCII, UTF-8, Long, and UUID (lexical or time). @@ -53, +52 @@ A JSON representation of the key -> column families -> column structure is {{{ { - "mccv":{ + "mccv":{ //Key - "Users":{ + "Users":{ //Column Familiy - "emailAddress":{"name":"emailAddress", "value":"[email protected]"}, + "emailAddress":{"name":"emailAddress", "value":"[email protected]", "timestamp":"1234567890"}, //Column - "webSite":{"name":"webSite", "value":"http://bar.com"} + "webSite":{"name":"webSite", "value":"http://bar.com", "timestamp":"1234567890"} //Column }, - "Stats":{ - "visits":{"name":"visits", "value":"243"} + "Stats":{ //Column Family + "visits":{"name":"visits", "value":"243", "timestamp":"1234567890"} //Column } }, - "user2":{ - "Users":{ + "matt":{ //Key + "Users":{ //Column Familiy - "emailAddress":{"name":"emailAddress", "value":"[email protected]"}, + "emailAddress":{"name":"emailAddress", "value":"[email protected]", "timestamp":"1234567890"}, //Column - "twitter":{"name":"twitter", "value":"user2"} + "twitter":{"name":"twitter", "value":"user2", "timestamp":"1234567890"} //Column } } } }}} - Note that the key "mccv" identifies data in two different column families, "Users" and "Stats". This does not imply that data from these column families is related. The semantics of having data for the same key in two different column families is entirely up to the application. Also note that within the "Users" column family, "mccv" and "user2" have different column names defined. This is perfectly valid in Cassandra. In fact there may be a virtually unlimited set of column names defined, which leads to fairly common use of the column name as a piece of runtime populated data. This is unusual in storage systems, particularly if you're coming from the RDBMS world. + Note that the key "mccv" identifies data in two different column families, "Users" and "Stats". This does not imply that data from these column families is related. The semantics of having data for the same key in two different column families is entirely up to the application. Also note that within the "Users" column family, "mccv" and "matt" have different column names defined. This is perfectly valid in Cassandra. In fact there may be a virtually unlimited set of column names defined, which leads to fairly common use of the column name as a piece of runtime populated data. This is unusual in storage systems, particularly if you're coming from the RDBMS world. = Keyspaces = @@ -86, +85 @@ A JSON description of this layout: {{{ { - "mccv": { + "mccv": { //Key - "Tags": { - "cassandra": { + "Tags": { //column family + "cassandra": { //SuperColumn - "incubator": {"incubator": "http://incubator.apache.org/cassandra/"}, + "incubator": {"incubator": "http://incubator.apache.org/cassandra/", "timestamp":"1234567890"}, //Column - "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA"} + "jira": {"jira": "http://issues.apache.org/jira/browse/CASSANDRA", "timestamp":"1234567890"} //Column }, - "thrift": { + "thrift": { //SuperColumn - "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT"} + "jira": {"jira": "http://issues.apache.org/jira/browse/THRIFT", "timestamp":"1234567890"} //Column } } }
