[jira] [Commented] (CASSANDRA-9231) Support Routing Key as part of Partition Key

Sylvain Lebresne (JIRA) Fri, 08 May 2015 05:44:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534435#comment-14534435
 ]


Sylvain Lebresne commented on CASSANDRA-9231:
---------------------------------------------

bq. the partition key distributes the data both within and without the node, 
whereas the routing key only without

I honestly don't understand what that sentence means, *especially* in term of 
modeling (the concept of distribution within a node sounds an aweful lot like 
getting into implementation details). I know I'm not very smart, but let's say 
I'm still not sold about the whole simplicity of how to explain the concept.

bq. There are also two things that seem to be conflated in your proposal: per 
table partitioners, and arbitrary functions as partitioners.

I'm not sure why you're trying to find complexity in what I'm suggesting.  
Technically, the routing key idea is just saying that for a specific table, 
instead of using the "default" partitioner hash function on the partition key 
to compute the token, we'll use a function that first project some part of said 
partition key and then apply the hash function. It is using a custom token 
function, just a super special one. I'm only suggesting we allow any function 
instead of just either the default or another very special function. There is 
nothing more to do with midpoint calculation, random token creation and whatnot 
than with the routing key idea.

I'm an not in any way suggesting per-table partitioners. I don't want to do it 
ever because that's a lot of complexity that I'm really not convinced is worth 
it. What I am saying is that by allowing generic custom token function (instead 
of just a syntax for one specific custom function), we'll likely actually cover 
most of the use case for per-table partitioner (probably not all, but most).  
And this with virtually no added complexity compared to the routing key idea.

bq. However we can deliver a lot of the functionality you suggest with just 
arbitrary function application to the fields in the partition key when defining 
the routing key.

That's almost exactly what I'm suggesting, except that by making it just one 
function on the whole partition key, it's actually more flexible and you don't 
have to introduce 2 concepts: the routing key and then functions on routing key 
elements.


> Support Routing Key as part of Partition Key
> --------------------------------------------
>
>                 Key: CASSANDRA-9231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Matthias Broecheler
>             Fix For: 3.x
>
>
> Provide support for sub-dividing the partition key into a routing key and a 
> non-routing key component. Currently, all columns that make up the partition 
> key of the primary key are also routing keys, i.e. they determine which nodes 
> store the data. This proposal would give the data modeler the ability to 
> designate only a subset of the columns that comprise the partition key to be 
> routing keys. The non-routing key columns of the partition key identify the 
> partition but are not used to determine where to store the data.
> Consider the following example table definition:
> CREATE TABLE foo (
>   a int,
>   b int,
>   c int,
>   d int,
>   PRIMARY KEY  (([a], b), c ) );
> (a,b) is the partition key, c is the clustering key, and d is just a column. 
> In addition, the square brackets identify the routing key as column a. This 
> means that only the value of column a is used to determine the node for data 
> placement (i.e. only the value of column a is murmur3 hashed to compute the 
> token). In addition, column b is needed to identify the partition but does 
> not influence the placement.
> This has the benefit that all rows with the same routing key (but potentially 
> different non-routing key columns of the partition key) are stored on the 
> same node and that knowledge of such co-locality can be exploited by 
> applications build on top of Cassandra.
> Currently, the only way to achieve co-locality is within a partition. 
> However, this approach has the limitations that: a) there are theoretical and 
> (more importantly) practical limitations on the size of a partition and b) 
> rows within a partition are ordered and an index is build to exploit such 
> ordering. For large partitions that overhead is significant if ordering isn't 
> needed.
> In other words, routing keys afford a simple means to achieve scalable 
> node-level co-locality without ordering while clustering keys afford 
> page-level co-locality with ordering. As such, they address different 
> co-locality needs giving the data modeler the flexibility to choose what is 
> needed for their application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9231) Support Routing Key as part of Partition Key

Reply via email to