[
https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534403#comment-14534403
]
Benedict commented on CASSANDRA-9231:
-------------------------------------
bq. invalidate less documentation/existing assumptions
But we wont invalidate them: it will still be true of the partition key; the
routing key would always be a subset of the partition key, so the statements
still hold true. The difference is that the partition key distributes the data
both within and without the node, whereas the routing key only without. So it's
a refinement rather than a rewrite/invalidation.
bq. Besides, that's really only one of my point.
There are also two things that seem to be conflated in your proposal: per table
partitioners, and arbitrary functions as partitioners. The latter is more
problematic than the former, since we need to know certain things about the
token distribution, such as order preservation, midpoint calculation, random
token creation; even ring description is apparently specialized (perhaps this
can be abstracted, not sure).
However we can deliver a lot of the functionality you suggest with just
arbitrary function application to the fields in the partition key when defining
the routing key. I don't think this should be in the initial version, for the
record, but defining {{PRIMARY KEY (( [truncate(a),b] a, b), ...)}} would
achieve the same goal.
Permitting per-table IPartitioner declarations also seems like a good thing to
support, but seems a different goal to me; that's an even lower level decision,
and really all you want is hashed/partitioned. But you want those to be _good_
at their jobs; if you screw that up, C* may behave unexpectedly.
> Support Routing Key as part of Partition Key
> --------------------------------------------
>
> Key: CASSANDRA-9231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
> Project: Cassandra
> Issue Type: Wish
> Components: Core
> Reporter: Matthias Broecheler
> Fix For: 3.x
>
>
> Provide support for sub-dividing the partition key into a routing key and a
> non-routing key component. Currently, all columns that make up the partition
> key of the primary key are also routing keys, i.e. they determine which nodes
> store the data. This proposal would give the data modeler the ability to
> designate only a subset of the columns that comprise the partition key to be
> routing keys. The non-routing key columns of the partition key identify the
> partition but are not used to determine where to store the data.
> Consider the following example table definition:
> CREATE TABLE foo (
> a int,
> b int,
> c int,
> d int,
> PRIMARY KEY (([a], b), c ) );
> (a,b) is the partition key, c is the clustering key, and d is just a column.
> In addition, the square brackets identify the routing key as column a. This
> means that only the value of column a is used to determine the node for data
> placement (i.e. only the value of column a is murmur3 hashed to compute the
> token). In addition, column b is needed to identify the partition but does
> not influence the placement.
> This has the benefit that all rows with the same routing key (but potentially
> different non-routing key columns of the partition key) are stored on the
> same node and that knowledge of such co-locality can be exploited by
> applications build on top of Cassandra.
> Currently, the only way to achieve co-locality is within a partition.
> However, this approach has the limitations that: a) there are theoretical and
> (more importantly) practical limitations on the size of a partition and b)
> rows within a partition are ordered and an index is build to exploit such
> ordering. For large partitions that overhead is significant if ordering isn't
> needed.
> In other words, routing keys afford a simple means to achieve scalable
> node-level co-locality without ordering while clustering keys afford
> page-level co-locality with ordering. As such, they address different
> co-locality needs giving the data modeler the flexibility to choose what is
> needed for their application.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)