[
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021118#comment-13021118
]
Peter Schuller commented on CASSANDRA-2405:
-------------------------------------------
If I'm reading the code correctly, then no I mean earlier. Recall that the
reason AES is important w.r.t. GC grace seconds, is that in order for it to be
safe to remove a tombstone for some piece of data, said piece of data must be
guaranteed to have become consistent across the cluster up to the point of gc
grace period start (at the moment of tombstone removal).
That essentially boils down to any write that happened prior to the start of
the gc grace period must have been propagated, whether it be in the form of a
hinted hand-off or by 'nodetool repair'. Since hinted hand-off is only ever an
optimization, only nodetool repair is relevant to maintaining the invariant.
An AES session will only be guaranteed to "see" things that existed in the form
of sstables at the point where it started. This presumably means that AES
implies that a memtable flush happens (if not, it would be broken I think).
So that in turn means that the time to record as 'last successful repair' needs
to be before the flushing of memtables.
It should be noted that of course, for monitoring purposes this isn't about a
few milliseconds here and there. So maybe that's enough to fudge the memtable
flushing (although I'm not personally comfortable with that either); but
definitely the time it takes to do the validating compaction must be counted
*after* the millisecond timestamp since that can clearly take a lot of time
(even days for large CF:s).
> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-2405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Peter Schuller
> Assignee: Pavel Yaskevich
> Priority: Minor
> Fix For: 0.7.5
>
> Attachments: CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since
> last successful repair for a particular column family, to make it easier to
> write a correct script to monitor for lack of repair in a non-buggy fashion.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira