[
https://issues.apache.org/jira/browse/CASSANDRA-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116170#comment-14116170
]
Drew Kutcharian edited comment on CASSANDRA-7850 at 8/30/14 2:00 AM:
---------------------------------------------------------------------
Yes, but then I might end up with very wide _thrift_ rows.
Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}}
where records with same block_id get stored on the same node *regardless* of
the value of breed_bucket. But I don't want to use {{PRIMARY KEY (block_id,
breed_bucket, breed)}} since in that case all the records for a block_id would
end up in a single _thrift_ row.
So, ideally the layout would be:
block_id -> decides the node
(block_id, breed_bucket) -> decides the _thrift_ row. Old school "row key"
breed -> prefix of _thrift_ columns. Old school "column name prefix"
was (Author: drew_kutchar):
Yes, but then I might end up with very wide rows.
Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}}
where records with same block_id and breed_bucket get stored on the same node,
but in different _thrift_ rows so I don't have very wide rows (millions of
_thrift_ columns per _thrift_ row).
> Composite Aware Partitioner
> ---------------------------
>
> Key: CASSANDRA-7850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7850
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Drew Kutcharian
>
> Since C* supports composites for partition keys, I think it'd be useful to
> have the ability to only use first (or first few) components of the key to
> calculate the token hash.
> A naive use case would be multi-tenancy:
> Say we have accounts and accounts have users. So we would have the following
> tables:
> {code}
> CREATE TABLE account (
> id timeuuid PRIMARY KEY,
> company text
> );
> {code}
> {code}
> CREATE TABLE user (
> id timeuuid PRIMARY KEY,
> accountId timeuuid,
> email text,
> password text
> );
> {code}
> {code}
> // Get users by account
> CREATE TABLE user_account_index (
> accountId timeuuid,
> userId timeuuid,
> PRIMARY KEY(acid, id)
> );
> {code}
> Say we want to get all the users that belong to an account. We would first
> have to get the results from user_account_index and then use a multi-get
> (WHERE IN) to get the records from user table. Now this multi-get part could
> potentially query a lot of different nodes in the cluster. It’d be great if
> there was a way to limit storage of users of an account to a single node so
> that way multi-get would only need to query a single node.
> With this improvement we would be able to define the user table like so:
> {code}
> CREATE TABLE user (
> id timeuuid,
> accountId timeuuid,
> email text,
> password text,
> PRIMARY KEY(((accountId),id)) //extra parentheses
> );
> {code}
> I'm not too sure about the notation, it could be something like PRIMARY
> KEY(((accountId),id)) where the "(accountId)" means use this part to
> calculate the hash and ((accountId),id) is the actual partition key.
> The main complication I see with this is that we would have to use the table
> definition when calculating hashes so we know what components of the
> partition keys need to be used for hash calculation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)