Matthias Broecheler created CASSANDRA-9231:
----------------------------------------------

             Summary: Support Routing Key as part of Partition Key
                 Key: CASSANDRA-9231
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
             Project: Cassandra
          Issue Type: Wish
          Components: Core
            Reporter: Matthias Broecheler


Provide support for sub-dividing the partition key into a routing key and a 
non-routing key component. Currently, all columns that make up the partition 
key of the primary key are also routing keys, i.e. they determine which nodes 
store the data. This proposal would give the data modeler the ability to 
designate only a subset of the columns that comprise the partition key to be 
routing keys. The non-routing key columns of the partition key identify the 
partition but are not used to determine where to store the data.

Consider the following example table definition:
CREATE TABLE foo (
  a int,
  b int,
  c int,
  d int,
  PRIMARY KEY  (([a], b), c ) );

(a,b) is the partition key, c is the clustering key, and d is just a column. In 
addition, the square brackets identify the routing key as column a. This means 
that only the value of column a is used to determine the node for data 
placement (i.e. only the value of column a is murmur3 hashed to compute the 
token). In addition, column b is needed to identify the partition but does not 
influence the placement.

This has the benefit that all rows with the same routing key (but potentially 
different non-routing key columns of the partition key) are stored on the same 
node and that knowledge of such co-locality can be exploited by applications 
build on top of Cassandra.
Currently, the only way to achieve co-locality is within a partition. However, 
this approach has the limitations that: a) there are theoretical and (more 
importantly) practical limitations on the size of a partition and b) rows 
within a partition are ordered and an index is build to exploit such ordering. 
For large partitions that overhead is significant if ordering isn't needed.
In other words, routing keys afford a simple means to achieve scalable 
node-level co-locality without ordering while clustering keys afford page-level 
co-locality with ordering. As such, they address different co-locality needs 
giving the data modeler the flexibility to choose what is needed for their 
application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to