[ 
https://issues.apache.org/jira/browse/CASSANDRA-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13528973#comment-13528973
 ] 

Dominique De Vito commented on CASSANDRA-5054:
----------------------------------------------

Let's analyze the number of data communicated between Cassandra nodes, for 
RandomPartitioner and  RandomPartitioner4Pair (the new submitted partitioner).

The test is the following:
* we read/write columns associated with rowkeys like (P1, _) where P1 is 
fixed/constant and the second element of the pair is changing.
* N is the number of nodes of the cluster
* L is the number of lines read or written for a given replica, for the test
* all cluster nodes are used as (possible) coordinators.
* RF=3

We evaluate the exchanges' cost between the coordinator and other nodes, to 
determine the reponse's cost for each partitioner.


* RandomPartitioner

Hypothesis: each node stores 1/N of all the lines, and the read or write lines 
(of the tests) are equally distributed within the cluster.

So, the coordinator stores itself L / N lines among those used for responding 
to the request.

The number of communicated lines between the coordinator and other nodes is: 2 
* L/N + 3 * (L – L/N)

The number of communications between the coordinator and other nodes is: N-1


* RandomPartitioner4Pair (new partitioner)

As rowkeys requested are like (P1, _), with P1 fixed, all rowkeys are expected 
to be stored on 3 nodes only (due to RF=3).

2 cases:
1)      the coordinator is one of the 3 nodes storing the requested lines
2)      the other case

For lines:
1) the number of communicated lines between the coordinator and other nodes is: 
2 * L
2) the number of communicated lines between the coordinator and other nodes is: 
3 * L

In average: (3 * 2 * L + (N-3) * 3 * L) / N

For communications:
1) the number of communications between the coordinator and other nodes is: 2
2) the number of communications between the coordinator and other nodes is: 3

In average: (3 * 2 + (N-3) * 3) / N


* Conclusion (1/2)

- For the number of communicated lines between the coordinator and other nodes: 
both partitioners are roughly quite equal.

RandomPartitioner4Pair: (3 * 2 * L + (N-3) * 3 * L) / N
= 3 L * (N-1) / N
= L * (3 N – 3) / N

RandomPartitioner: 2 * L/N + 3 * (L – L/N)
= 3 * L – L / N
= L * (3 – 1 / N)
= L * (3 N – 1) / N

In fact, RandomPartitioner4Pair has a little advantage over RandomPartitioner : 
cost(RandomPartitioner4Pair) < cost(RandomPartitioner). 

Here, all cluster nodes are used as (possible) coordinators. So, the calculated 
costs above are average costs.
So, in average, RandomPartitioner4Pair (the submitted new partitioner) is a 
little better than RandomPartitioner.

But, if we use Astyanax, it's possible to choose the coordinator among the 
replica.
And in that case, the cost of using RandomPartitioner4Pair is then much better 
than RandomPartitioner.

- For the number of communications between the coordinator and other nodes: 
RandomPartitioner4Pair has an advantage.


* Conclusion (2/2)

RandomPartitioner4Pair (the submitted new partitioner) is a little better than 
RandomPartitioner in average, or much more better than RandomPartitioner when 
used with Astyanax, when choosing the coordinator among the replica.



                
> new partitioner for rowkey pairs (P1, P2) targeting hierarchical data
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-5054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5054
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Contrib, Core
>            Reporter: Dominique De Vito
>            Priority: Trivial
>         Attachments: RandomPartitioner4Pair.java
>
>
> This new partitioner is submitted here both for validation and for proposal 
> to other Cassandra users (may be this partitioner has its place within 
> Cassandra core, or into some 'contrib' directory).
> This new partitioner is a variant of RandomPartitioner with special token 
> computation for composite rowkeys of size 2.
> The use case of this partitioner is about rowkeys of hierarchical data, that 
> is, rowkeys like (directory id, file id).
> This partitioner computes tokens so that rows with same "directory id" have a 
> great chance to be on the same node: the goal is that, when Cassandra is 
> asked about multiple file ids for the same directory id, a limited number of 
> nodes should be asked.
> So, in case of a composite rowkey of size 2 like (P1, P2), the partitioner 
> computes the token as follows: <code>merge(getHighBits(md5(P1)), 
> getLowBits(md5(P2)))</code>.
> In case of a rowkey that is NOT a pair, the partitioner returns the same 
> value than RandomPartitioner.
> This partitioner is expected to be used with Cassandra 1.1 or above.
> Cassandra stores in sstables the pair (token, rowkey). Since v1.1, during a 
> compaction phase, Cassandra does not sort anymore on token only, but 
> sort on the pair (token, rowkey). So, if a custom partitioner produces token 
> collisions (that is, multiple rowkeys with the same token), it won't be a pb 
> with v1.1 because, in order to retrieve a row, Cassandra is going to compare 
> the full pair (token, rowkey).
> As the percent of row collisions for this partitioner is unknown, it's 
> expected to be used with Cassandra 1.1 or above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to