[
https://issues.apache.org/jira/browse/CASSANDRA-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146124#comment-14146124
]
Branimir Lambov edited comment on CASSANDRA-5902 at 9/24/14 10:25 AM:
----------------------------------------------------------------------
It was a mistake not to add new tests. You are right, the code wasn't working
correctly.
A new version is now uploaded, which add tests, makes hint reporting to the
response handler a little less obscure, fixes the issue with hints not being
reported, and handles non-hintable replicas.
bq. Separately, it's not clear to me we should be stopping hint replay to the
target if one of these extra hints fails to be delivered, since they're
unrelated.
Are we stopping hint replay if a hint fails to be delivered? I don't think so,
we stop the current delivery cycle, since it would result in an unbreakable
loop if a hint wasn't successfully deleted. We can't really delete it if it
wasn't successfully processed, but the latter shouldn't happen now. (Note: it
_can_ happen if shouldHint changes for a node between compiling the list and
the time a hint is about to be written, but that will only happen due to TTL
expiration and should be extremely rare and will be sorted during the next
delivery cycle.)
was (Author: blambov):
It was a mistake not to add new tests. You are right, the code wasn't working
correctly.
A new version is now uploaded, which add tests, makes hint reporting to the
response handler a little less obscure, fixes the issue with hints not being
reported, and handles non-hintable replicas. It also switches to directly
sending messages to all replicas, because as far as I can see
sendToHintedEndpoints does not track timeouts for remote datacentre replicas
and thus cannot write or report hints for failures from them.
bq. Separately, it's not clear to me we should be stopping hint replay to the
target if one of these extra hints fails to be delivered, since they're
unrelated.
Are we stopping hint replay if a hint fails to be delivered? I don't think so,
we stop the current delivery cycle, since it would result in an unbreakable
loop if a hint wasn't successfully deleted. We can't really delete it if it
wasn't successfully processed, but the latter shouldn't happen now. (Note: it
_can_ happen if shouldHint changes for a node between compiling the list and
the time a hint is about to be written, but that will only happen due to TTL
expiration and should be extremely rare and will be sorted during the next
delivery cycle.)
> Dealing with hints after a topology change
> ------------------------------------------
>
> Key: CASSANDRA-5902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5902
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Assignee: Branimir Lambov
> Priority: Minor
> Fix For: 2.1.1
>
>
> Hints are stored and delivered by destination node id. This allows them to
> survive IP changes in the target, while making "scan all the hints for a
> given destination" an efficient operation. However, we do not detect and
> handle new node assuming responsibility for the hinted row via bootstrap
> before it can be delivered.
> I think we have to take a performance hit in this case -- we need to deliver
> such a hint to *all* replicas, since we don't know which is the "new" one.
> This happens infrequently enough, however -- requiring first the target node
> to be down to create the hint, then the hint owner to be down long enough for
> the target to both recover and stream to a new node -- that this should be
> okay.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)