[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

Peter Schuller (JIRA) Mon, 18 Apr 2011 10:14:48 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021118#comment-13021118
 ]


Peter Schuller commented on CASSANDRA-2405:
-------------------------------------------

If I'm reading the code correctly, then no I mean earlier. Recall that the 
reason AES is important w.r.t. GC grace seconds, is that in order for it to be 
safe to remove a tombstone for some piece of data, said piece of data must be 
guaranteed to have become consistent across the cluster up to the point of gc 
grace period start (at the moment of tombstone removal).

That essentially boils down to any write that happened prior to the start of 
the gc grace period must have been propagated, whether it be in the form of a 
hinted hand-off or by 'nodetool repair'. Since hinted hand-off is only ever an 
optimization, only nodetool repair is relevant to maintaining the invariant.

An AES session will only be guaranteed to "see" things that existed in the form 
of sstables at the point where it started. This presumably means that AES 
implies that a memtable flush happens (if not, it would be broken I think).

So that in turn means that the time to record as 'last successful repair' needs 
to be before the flushing of memtables.

It should be noted that of course, for monitoring purposes this isn't about a 
few milliseconds here and there. So maybe that's enough to fudge the memtable 
flushing (although I'm not personally comfortable with that either); but 
definitely the time it takes to do the validating compaction must be counted 
*after* the millisecond timestamp since that can clearly take a lot of time 
(even days for large CF:s).


> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.7.5
>
>         Attachments: CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is 
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since 
> last successful repair for a particular column family, to make it easier to 
> write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

Reply via email to