[jira] [Comment Edited] (CASSANDRA-5902) Dealing with hints after a topology change

Branimir Lambov (JIRA) Wed, 24 Sep 2014 03:27:08 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146124#comment-14146124
 ]


Branimir Lambov edited comment on CASSANDRA-5902 at 9/24/14 10:25 AM:
----------------------------------------------------------------------

It was a mistake not to add new tests. You are right, the code wasn't working 
correctly.

A new version is now uploaded, which add tests, makes hint reporting to the 
response handler a little less obscure, fixes the issue with hints not being 
reported, and handles non-hintable replicas.

bq. Separately, it's not clear to me we should be stopping hint replay to the 
target if one of these extra hints fails to be delivered, since they're 
unrelated.

Are we stopping hint replay if a hint fails to be delivered? I don't think so, 
we stop the current delivery cycle, since it would result in an unbreakable 
loop if a hint wasn't successfully deleted. We can't really delete it if it 
wasn't successfully processed, but the latter shouldn't happen now. (Note: it 
_can_ happen if shouldHint changes for a node between compiling the list and 
the time a hint is about to be written, but that will only happen due to TTL 
expiration and should be extremely rare and will be sorted during the next 
delivery cycle.)


was (Author: blambov):
It was a mistake not to add new tests. You are right, the code wasn't working 
correctly.

A new version is now uploaded, which add tests, makes hint reporting to the 
response handler a little less obscure, fixes the issue with hints not being 
reported, and handles non-hintable replicas. It also switches to directly 
sending messages to all replicas, because as far as I can see 
sendToHintedEndpoints does not track timeouts for remote datacentre replicas 
and thus cannot write or report hints for failures from them.

bq. Separately, it's not clear to me we should be stopping hint replay to the 
target if one of these extra hints fails to be delivered, since they're 
unrelated.

Are we stopping hint replay if a hint fails to be delivered? I don't think so, 
we stop the current delivery cycle, since it would result in an unbreakable 
loop if a hint wasn't successfully deleted. We can't really delete it if it 
wasn't successfully processed, but the latter shouldn't happen now. (Note: it 
_can_ happen if shouldHint changes for a node between compiling the list and 
the time a hint is about to be written, but that will only happen due to TTL 
expiration and should be extremely rare and will be sorted during the next 
delivery cycle.)

> Dealing with hints after a topology change
> ------------------------------------------
>
>                 Key: CASSANDRA-5902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5902
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Branimir Lambov
>            Priority: Minor
>             Fix For: 2.1.1
>
>
> Hints are stored and delivered by destination node id.  This allows them to 
> survive IP changes in the target, while making "scan all the hints for a 
> given destination" an efficient operation.  However, we do not detect and 
> handle new node assuming responsibility for the hinted row via bootstrap 
> before it can be delivered.
> I think we have to take a performance hit in this case -- we need to deliver 
> such a hint to *all* replicas, since we don't know which is the "new" one.  
> This happens infrequently enough, however -- requiring first the target node 
> to be down to create the hint, then the hint owner to be down long enough for 
> the target to both recover and stream to a new node -- that this should be 
> okay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-5902) Dealing with hints after a topology change

Reply via email to