[
https://issues.apache.org/jira/browse/CASSANDRA-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532046#comment-16532046
]
Jay Zhuang commented on CASSANDRA-14555:
----------------------------------------
Thanks [~beobal] to point out the issue. I think CASSANDRA-14252 is a further
fix for CASSANDRA-13074 which fixes the same issue as CASSANDRA-2662:
{quote}[~brandon.williams]:
Given coordinator A, and replicas X, Y, and Z (in subsnitch order), on the
first round X will be chosen, and let's say it receives a score of 1. With the
patch, at this point Y and Z will be initialized with zero. On the next round,
Y will be chosen, and let's say it receives a score of or near 1, depending on
network latency. On the third round, Z will be chosen, and let's say it also
receives a score similar to Y. Now the cache is hot on all nodes, and
subsequent reads have the possibility to oscillate between all three based on
network latency variance. This can be mitigated though with the badness
threshold. With the badness threshold on, the first round will occur as before,
but subsequent rounds will continue to use X until it degrades past the
threshold, at which point they will use Y, until the dynamic snitch reset()s,
at which point everything will repeat. I don't think this is harmful to
CASSANDRA-1314 after all.
https://issues.apache.org/jira/browse/CASSANDRA-2662?focusedCommentId=13035597&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13035597
{quote}
It's definitely a valid point that data streaming may prefer a local node (by
{{subsnitch}}) to an unknown node. In the example above, let's say replica Z is
in another DC, steaming may not want to select Z, unless X and Y are really
bad. But there's a chance that it could be selected (even without
CASSANDRA-14252) when there's no score for Z.
So maybe we could introduce another {{sortByProximity()}} to exclude the zero
score endpoints for streaming, or have a different
{{dynamic_snitch_badness_threshold}} for streaming (with a
[change|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java#L251]
to support zero scores).
Overall, I don't think CASSANDRA-14252 should be reverted and it should not
block the {{3.0.17}} and {{3.11.3}} releases. cc. [~jkni], [~tjake],
[~brandon.williams].
> Verify effect of CASSANDRA-14252 on streaming endpoint selection
> ----------------------------------------------------------------
>
> Key: CASSANDRA-14555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14555
> Project: Cassandra
> Issue Type: Task
> Components: Streaming and Messaging
> Reporter: Sam Tunnicliffe
> Priority: Major
> Fix For: 4.x
>
>
> CASSANDRA-14252 makes a slight change to {{DynamicEndpointSnitch}} so that it
> is somewhat more likely a replica in a remote DC is contacted when replicas
> in the local DC are considered degraded. This seems reasonable on the read
> path, but it could also affect selection of endpoints for streaming and cross
> DC streaming is probably something that operators want to control more
> tightly. To be clear, I’m not 100% sure that this is actually an issue, but
> I’d like to have some investigation into it before we ship a change to
> default behaviour.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]