[
https://issues.apache.org/jira/browse/CASSANDRA-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-2890:
----------------------------------------
Attachment: 2890.patch
Ok, I wasn't understanding that "the lowest value IP address" is always chosen
thing, but as it turns out the DynamicSnitch uses the underlying snitch
compareEndpoints() method if two endpoints have the same score (which includes
the case where they have no scores at all, because no reads have been done for
instance). And the SimpleSnitch compareEndpoints method happens to compare the
endpoint by IP address. I'm not sure this is a really good choice because
# this is not coherent with the sortEndpointByProximity sorting
# this doesn't correspond to how the NetworkTopolySnitch compareEndpoints
work when restricted to only one datacenter (it sets all node equal).
So I think there may be something to change in there, but anyway that is not
completely relevant to this issue.
On this issue, I agree with Stu than tracking the latency and using that is
probably the best solution (though that does suppose the use of the
DynamicSnitch). However, this is not so simple, because the latency we have (on
the coordinator) is the latency of the whole counter write, that is, it does
not only include the read by the first replica, but also the local write (not a
big deal) and the latencies of the writes to the other replica. Those last ones
depends on the consistency level for instance. Also, a TimeoutException does
not necessarily means that the first replica is to blame, it could be that
enough other replica timed out (so that the consistency level wasn't achieved).
There may be solutions to those problem, but I don't see any simple ones. Now,
as there is reports this is a problem "in the wild", I propose we go for the
simple "randomize" solution for now and push this directly to the 0.8 series.
Attaching a patch (against 0.8) for this. Then we can open another ticket to
improve over that.
> Randomize (to some extend) the choice of the first replica for counter
> increment
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-2890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2890
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Labels: counters
> Fix For: 0.8.6
>
> Attachments: 2890.patch
>
>
> Right now, we choose the first replica for a counter increments based solely
> on what the snitch returns. If the clients requests are well balanced over
> the cluster and the snitch not ill configured, this should not be a problem,
> but this is probably too strong an assumption to make.
> The goal of this ticket is to change this to choose a random replica in the
> current data center instead.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira