[ 
https://issues.apache.org/jira/browse/CASSANDRA-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529097#comment-13529097
 ] 

Jonathan Ellis commented on CASSANDRA-5054:
-------------------------------------------

My first reaction is that we probably don't want to encourage people doing this 
vs a single wide row.  What problem with the latter are you trying to address?
                
> new partitioner for rowkey pairs (P1, P2) targeting hierarchical data
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-5054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5054
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Core
>            Reporter: Dominique De Vito
>            Priority: Trivial
>         Attachments: RandomPartitioner4Pair.java
>
>
> This new partitioner is submitted here both for validation and for proposal 
> to other Cassandra users (may be this partitioner has its place within 
> Cassandra core, or into some 'contrib' directory).
> This new partitioner is a variant of RandomPartitioner with special token 
> computation for composite rowkeys of size 2.
> The use case of this partitioner is about rowkeys of hierarchical data, that 
> is, rowkeys like (directory id, file id).
> This partitioner computes tokens so that rows with same "directory id" have a 
> great chance to be on the same node: the goal is that, when Cassandra is 
> asked about multiple file ids for the same directory id, a limited number of 
> nodes should be asked.
> So, in case of a composite rowkey of size 2 like (P1, P2), the partitioner 
> computes the token as follows: <code>merge(getHighBits(md5(P1)), 
> getLowBits(md5(P2)))</code>.
> In case of a rowkey that is NOT a pair, the partitioner returns the same 
> value than RandomPartitioner.
> This partitioner is expected to be used with Cassandra 1.1 or above.
> Cassandra stores in sstables the pair (token, rowkey). Since v1.1, during a 
> compaction phase, Cassandra does not sort anymore on token only, but 
> sort on the pair (token, rowkey). So, if a custom partitioner produces token 
> collisions (that is, multiple rowkeys with the same token), it won't be a pb 
> with v1.1 because, in order to retrieve a row, Cassandra is going to compare 
> the full pair (token, rowkey).
> As the percent of row collisions for this partitioner is unknown, it's 
> expected to be used with Cassandra 1.1 or above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to