[jira] [Updated] (CASSANDRA-5054) new partitioner for rowkey pairs (P1, P2) targeting hierarchical data

Dominique De Vito (JIRA) Tue, 11 Dec 2012 05:03:25 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dominique De Vito updated CASSANDRA-5054:
-----------------------------------------

    Description: 
This new partitioner is submitted here both for validation and for proposal to 
other Cassandra users (may be this partitioner has its place within Cassandra 
core, or into some 'contrib' directory).

This new partitioner is a variant of RandomPartitioner with special token 
computation for composite rowkeys of size 2.

The use case of this partitioner is about rowkeys of hierarchical data, that 
is, rowkeys like (directory id, file id).
This partitioner computes tokens so that rows with same "directory id" have a 
great chance to be on the same node: the goal is that, when Cassandra is asked 
about multiple file ids for the same directory id, a limited number of nodes 
should be asked.

So, in case of a composite rowkey of size 2 like (P1, P2), the partitioner 
computes the token as follows: <code>merge(getHighBits(md5(P1)), 
getLowBits(md5(P2)))</code>.
In case of a rowkey that is NOT a pair, the partitioner returns the same value 
than RandomPartitioner.

This partitioner is expected to be used with Cassandra 1.1 or above.
Cassandra stores in sstables the pair (token, rowkey). Since v1.1, during a 
compaction phase, Cassandra does not sort anymore on token only, but 
sort on the pair (token, rowkey). So, if a custom partitioner produces token 
collisions (that is, multiple rowkeys with the same token), it won't be a pb 
with v1.1 because, in order to retrieve a row, Cassandra is going to compare 
the full pair (token, rowkey).
As the percent of row collisions for this partitioner is unknown, it's expected 
to be used with Cassandra 1.1 or above.
    
> new partitioner for rowkey pairs (P1, P2) targeting hierarchical data
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-5054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5054
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Core
>            Reporter: Dominique De Vito
>            Priority: Trivial
>
> This new partitioner is submitted here both for validation and for proposal 
> to other Cassandra users (may be this partitioner has its place within 
> Cassandra core, or into some 'contrib' directory).
> This new partitioner is a variant of RandomPartitioner with special token 
> computation for composite rowkeys of size 2.
> The use case of this partitioner is about rowkeys of hierarchical data, that 
> is, rowkeys like (directory id, file id).
> This partitioner computes tokens so that rows with same "directory id" have a 
> great chance to be on the same node: the goal is that, when Cassandra is 
> asked about multiple file ids for the same directory id, a limited number of 
> nodes should be asked.
> So, in case of a composite rowkey of size 2 like (P1, P2), the partitioner 
> computes the token as follows: <code>merge(getHighBits(md5(P1)), 
> getLowBits(md5(P2)))</code>.
> In case of a rowkey that is NOT a pair, the partitioner returns the same 
> value than RandomPartitioner.
> This partitioner is expected to be used with Cassandra 1.1 or above.
> Cassandra stores in sstables the pair (token, rowkey). Since v1.1, during a 
> compaction phase, Cassandra does not sort anymore on token only, but 
> sort on the pair (token, rowkey). So, if a custom partitioner produces token 
> collisions (that is, multiple rowkeys with the same token), it won't be a pb 
> with v1.1 because, in order to retrieve a row, Cassandra is going to compare 
> the full pair (token, rowkey).
> As the percent of row collisions for this partitioner is unknown, it's 
> expected to be used with Cassandra 1.1 or above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5054) new partitioner for rowkey pairs (P1, P2) targeting hierarchical data

Reply via email to