[
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021188#comment-13021188
]
Peter Schuller commented on CASSANDRA-2405:
-------------------------------------------
The best solution I can think of is to populate the information on CF creation
with the timestamp that represents the time the CF was created on the node. If
the node was bootstrapped as usual, that would have happened after the local CF
creation. If it was not (e.g. forcefully inserted into the ring), then some
operator has explicitly made the choice of entering it into the ring
"inconsistently" anyway so it doesn't matter.
If this is easy to do, I think it would make for a really clean solution from
the point of view of the user. The nodetool command would always return valid
data except if something is truly broken; not even a single edge case to deal
with. Simplicity rocks for this type of thing (for writing a monitoring script
to trigger an alarm).
If that's overkill/non-easy, I dunno - slight preference for throwing an
exception just because I really dislike silent failures and returning an
out-of-band integer seems more likely to go unnoticed if somehow it never
changes because repair is *never* run, for example. I.e, either your monitoring
script treats -1 as an error anyway (so it's no worse in terms of triggering
the alarm unnecessarily than an exception), or it doesn't - in which case you
have a silent failure mode in the case of perpetual lack of repair running.
> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-2405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Peter Schuller
> Assignee: Pavel Yaskevich
> Priority: Minor
> Fix For: 0.7.5
>
> Attachments: CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since
> last successful repair for a particular column family, to make it easier to
> write a correct script to monitor for lack of repair in a non-buggy fashion.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira