[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

Sylvain Lebresne (JIRA) Wed, 15 Jun 2011 06:51:55 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049770#comment-13049770
 ]


Sylvain Lebresne commented on CASSANDRA-2405:
---------------------------------------------

The problem with using the completion time as the (Super)Column name is that 
you have to wait the end of the repair to store anything. First, this will not 
capture started but failed session (which while not mandatory could be nice, 
especially as soon as we will start keeping a bit more info this could help 
troubleshooting). And Second, it will be a pain to have to keep some of the 
information until the end (the processingStartedAt is a first sign of this). 
And third, we may want to keep some info on say merkle tree creation on all 
replica participating in the repair, even though we only store the completed 
time on the node initiating the repair.

So I would propose to something like:
  row key: KS/CF
  super column name: repair session name (a TimeUUID)
  columns: the infos on the session (range, start and end time, number of range 
repaired, bytes transferred, ...)

That is roughly the same thing as you propose but with super column name being 
the repair session name.

Now, because the repair session names are TimeUUID (well, right now it is a 
sting including a UUID, we can change it to a simple TimeUUID easily), the 
session will be ordered by creation time. So getting the last successful repair 
is probably not too hard: just grab the last 1000 created sessions and find the 
last successful one.
And if we want, we can even use another specific "index" row that associate 
'completion time' -> 'session UUID' (and thanks to the new DynamicCompositeType 
we can have some rows ordered by TimeUUIDType and some other ordered by 
LongType without the need of multiple system table).

> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.8.1
>
>         Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
> CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is 
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since 
> last successful repair for a particular column family, to make it easier to 
> write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

Reply via email to