[
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405734#comment-17405734
]
Stefan Miklosovic edited comment on CASSANDRA-14309 at 8/27/21, 10:36 AM:
--------------------------------------------------------------------------
I am afraid we need to revisit the logic we try to introduce here.
Consider following scenario:
This is in HintVerbHandler:
{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
// the topology has changed, and we are no longer a replica of the mutation
- since we don't know which node(s)
// it has been handed over to, re-address the hint to all replicas; see
CASSANDRA-5902.
HintsService.instance.writeForAllReplicas(hint);
respond(message);
}
{code}
and the implementation:
{code:java}
/**
* Write a hint for all replicas. Used to re-dispatch hints whose destination
is either missing or no longer correct.
*/
void writeForAllReplicas(Hint hint)
{
String keyspaceName = hint.mutation.getKeyspaceName();
Token token = hint.mutation.key().getToken();
EndpointsForToken replicas =
ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName),
token).all();
// judicious use of streams: eagerly materializing probably cheaper
// than performing filters / translations 2x extra via
Iterables.filter/transform
List<UUID> hostIds = replicas.stream()
.filter(StorageProxy::shouldHint)
.map(replica ->
StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
.collect(Collectors.toList());
write(hostIds, hint);
}
{code}
To guide a reader through this problem:
Node A sends a hint to node B, then this is reached in node B (snippets above)
and it sees that it is not meant to be for it anymore because that topology
changed in the meanwhile. So what it does is that it will filter replicas that
hint is for and then it will again call shouldHint on it and there shouldHint
checks if that hint is expired - so if it is expired it does not make sense to
send it to that node anymore. This checking is done firstly by looking if it is
indeed not expired, but then it also checks this "earliest" thingy we try to
introduce here.
But that should not be the part of the decision? In this very specific case,
because that decision - again, in this specific case - would be also derived
from the fact if there is some hint already from earlier times and I am just
thinking what it means in this scenario.
In other words, the decision whether a node should receive a hint in case a
node for which topology changed happened to receive it should not take into
account if there is or is not some earlier hint for that node from the
perspective of the node which received that hint which is not meant to be for
it.
Node B is now "node A" which decides if a hint should be sent or not to some
other node, same logic was used previously but we only depended on Gossiper to
report how long that node to hint was down which was more or less same time for
each node, but this "earliest hint for a node" is specific for that node only
and I am not sure it has anything to do with the logic we are trying to
introduce here.
I am not fully aware of the conseqencies of doing so in this example.
was (Author: stefan.miklosovic):
I am afraid we need to revisit the logic we try to introduce here.
Consider following scenario:
This is in HintVerbHandler:
{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
// the topology has changed, and we are no longer a replica of the mutation
- since we don't know which node(s)
// it has been handed over to, re-address the hint to all replicas; see
CASSANDRA-5902.
HintsService.instance.writeForAllReplicas(hint);
respond(message);
}
{code}
and the implementation:
{code:java}
/**
* Write a hint for all replicas. Used to re-dispatch hints whose destination
is either missing or no longer correct.
*/
void writeForAllReplicas(Hint hint)
{
String keyspaceName = hint.mutation.getKeyspaceName();
Token token = hint.mutation.key().getToken();
EndpointsForToken replicas =
ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName),
token).all();
// judicious use of streams: eagerly materializing probably cheaper
// than performing filters / translations 2x extra via
Iterables.filter/transform
List<UUID> hostIds = replicas.stream()
.filter(StorageProxy::shouldHint)
.map(replica ->
StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
.collect(Collectors.toList());
write(hostIds, hint);
}
{code}
To guide a reader through this problem:
Node A sends a hint to node B, then this is reached in node B (snippets above)
and it sees that it is not meant to be for it anymore because that topology
changed in the meanwhile. So what it does is that it will filter replicas that
hint is for and then it will again call shouldHint on it and there shouldHint
checks if that hint is expired - so if it is expired it does not make sense to
send it to that node anymore. This checking is done firstly by looking if it is
indeed not expired, but then it also checks this "earliest" thingy we try to
introduce here.
But that should not be the part of the decision? In this very specific case,
because that decision - again, in this specific case - would be also derived
from the fact if there is some hint already from earlier times and I am just
thinking what it means in this scenario.
In other words, the decision whether a node should receive a hint in case a
node for which topology changed happened to receive it should not take into
account if there is or is not some earlier hint for that node from the
perspective of the node which received that initial hint.
Node B is now "node A" which decides if a hint should be sent or not to some
other node, same logic was used previously but we only depended on Gossiper to
report how long that node to hint was down which was more or less same time for
each node, but this "earliest hint for a node" is specific for that node only
and I am not sure it has anything to do with the logic we are trying to
introduce here.
I am not fully aware of the conseqencies of doing so in this example.
> Make hint window persistent across restarts
> -------------------------------------------
>
> Key: CASSANDRA-14309
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
> Project: Cassandra
> Issue Type: Improvement
> Components: Consistency/Hints
> Reporter: Kurt Greaves
> Assignee: Stefan Miklosovic
> Priority: Low
> Fix For: 4.1
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by
> {{max_hint_window_in_ms}}, however this window is not persistent across
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
> # A goes down
> # X ms of hints are stored for A on B and C
> # A is restarted
> # A goes down again without hints replaying from B and C
> # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>
> # A goes down
> # X ms of hints are stored for A on B and C
> # B is restarted
> # B will store up to another {{max_hint_window_in_ms}} of hints for A
>
> Note that in both these scenarios they can continue forever. If A or B keeps
> getting restarted hints will continue to pile up.
>
> Idea of this ticket is to stop this behaviour from happening and only ever
> store up to {{max_hint_window_in_ms}} of hints for a particular node.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]