[jira] [Comment Edited] (CASSANDRA-14309) Make hint window persistent across restarts

Stefan Miklosovic (Jira) Fri, 27 Aug 2021 03:37:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405734#comment-17405734
 ]


Stefan Miklosovic edited comment on CASSANDRA-14309 at 8/27/21, 10:36 AM:
--------------------------------------------------------------------------

I am afraid we need to revisit the logic we try to introduce here.

Consider following scenario:

This is in HintVerbHandler:
{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
    // the topology has changed, and we are no longer a replica of the mutation 
- since we don't know which node(s)
    // it has been handed over to, re-address the hint to all replicas; see 
CASSANDRA-5902.
    HintsService.instance.writeForAllReplicas(hint);
    respond(message);
}
{code}
and the implementation:
{code:java}
/**
 * Write a hint for all replicas. Used to re-dispatch hints whose destination 
is either missing or no longer correct.
 */
void writeForAllReplicas(Hint hint)
{
    String keyspaceName = hint.mutation.getKeyspaceName();
    Token token = hint.mutation.key().getToken();

    EndpointsForToken replicas = 
ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName), 
token).all();

    // judicious use of streams: eagerly materializing probably cheaper
    // than performing filters / translations 2x extra via 
Iterables.filter/transform
    List<UUID> hostIds = replicas.stream()
            .filter(StorageProxy::shouldHint)
            .map(replica -> 
StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
            .collect(Collectors.toList());

    write(hostIds, hint);
}
{code}
To guide a reader through this problem:

Node A sends a hint to node B, then this is reached in node B (snippets above) 
and it sees that it is not meant to be for it anymore because that topology 
changed in the meanwhile. So what it does is that it will filter replicas that 
hint is for and then it will again call shouldHint on it and there shouldHint 
checks if that hint is expired - so if it is expired it does not make sense to 
send it to that node anymore. This checking is done firstly by looking if it is 
indeed not expired, but then it also checks this "earliest" thingy we try to 
introduce here.

But that should not be the part of the decision? In this very specific case, 
because that decision - again, in this specific case - would be also derived 
from the fact if there is some hint already from earlier times and I am just 
thinking what it means in this scenario.

In other words, the decision whether a node should receive a hint in case a 
node for which topology changed happened to receive it should not take into 
account if there is or is not some earlier hint for that node from the 
perspective of the node which received that hint which is not meant to be for 
it.

Node B is now "node A" which decides if a hint should be sent or not to some 
other node, same logic was used previously but we only depended on Gossiper to 
report how long that node to hint was down which was more or less same time for 
each node, but this "earliest hint for a node" is specific for that node only 
and I am not sure it has anything to do with the logic we are trying to 
introduce here.

I am not fully aware of the conseqencies of doing so in this example.


was (Author: stefan.miklosovic):
I am afraid we need to revisit the logic we try to introduce here.

Consider following scenario:

This is in HintVerbHandler:


{code:java}
else if (!StorageProxy.instance.appliesLocally(hint.mutation))
{
    // the topology has changed, and we are no longer a replica of the mutation 
- since we don't know which node(s)
    // it has been handed over to, re-address the hint to all replicas; see 
CASSANDRA-5902.
    HintsService.instance.writeForAllReplicas(hint);
    respond(message);
}
{code}

and the implementation:

{code:java}
/**
 * Write a hint for all replicas. Used to re-dispatch hints whose destination 
is either missing or no longer correct.
 */
void writeForAllReplicas(Hint hint)
{
    String keyspaceName = hint.mutation.getKeyspaceName();
    Token token = hint.mutation.key().getToken();

    EndpointsForToken replicas = 
ReplicaLayout.forTokenWriteLiveAndDown(Keyspace.open(keyspaceName), 
token).all();

    // judicious use of streams: eagerly materializing probably cheaper
    // than performing filters / translations 2x extra via 
Iterables.filter/transform
    List<UUID> hostIds = replicas.stream()
            .filter(StorageProxy::shouldHint)
            .map(replica -> 
StorageService.instance.getHostIdForEndpoint(replica.endpoint()))
            .collect(Collectors.toList());

    write(hostIds, hint);
}
{code}

To guide a reader through this problem:

Node A sends a hint to node B, then this is reached in node B (snippets above) 
and it sees that it is not meant to be for it anymore because that topology 
changed in the meanwhile. So what it does is that it will filter replicas that 
hint is for and then it will again call shouldHint on it and there shouldHint 
checks if that hint is expired - so if it is expired it does not make sense to 
send it to that node anymore. This checking is done firstly by looking if it is 
indeed not expired, but then it also checks this "earliest" thingy we try to 
introduce here.

But that should not be the part of the decision? In this very specific case, 
because that decision - again, in this specific case - would be also derived 
from the fact if there is some hint already from earlier times  and I am just 
thinking what it means in this scenario.

In other words, the decision whether a node should receive a hint in case a 
node for which topology changed happened to receive it should not take into 
account if there is or is not some earlier hint for that node from the 
perspective of the node which received that initial hint.

Node B is now "node A" which decides if a hint should be sent or not to some 
other node, same logic was used previously but we only depended on Gossiper to 
report how long that node to hint was down which was more or less same time for 
each node, but this "earliest hint for a node" is specific for that node only 
and I am not sure it has anything to do with the logic we are trying to 
introduce here.

I am not fully aware of the conseqencies of doing so in this example.

> Make hint window persistent across restarts
> -------------------------------------------
>
>                 Key: CASSANDRA-14309
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14309
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Hints
>            Reporter: Kurt Greaves
>            Assignee: Stefan Miklosovic
>            Priority: Low
>             Fix For: 4.1
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The current hint system stores a window of hints as defined by 
> {{max_hint_window_in_ms}}, however this window is not persistent across 
> restarts.
> Examples (cluster with RF=3 and 3 nodes, A, B, and C):
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # A is restarted
>  # A goes down again without hints replaying from B and C
>  # B and C will store up to another {{max_hint_window_in_ms}} of hints for A
>  
>  # A goes down
>  # X ms of hints are stored for A on B and C
>  # B is restarted
>  # B will store up to another {{max_hint_window_in_ms}} of hints for A
>  
> Note that in both these scenarios they can continue forever. If A or B keeps 
> getting restarted hints will continue to pile up.
>  
> Idea of this ticket is to stop this behaviour from happening and only ever 
> store up to {{max_hint_window_in_ms}} of hints for a particular node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14309) Make hint window persistent across restarts

Reply via email to