[ https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049770#comment-13049770 ]
Sylvain Lebresne commented on CASSANDRA-2405: --------------------------------------------- The problem with using the completion time as the (Super)Column name is that you have to wait the end of the repair to store anything. First, this will not capture started but failed session (which while not mandatory could be nice, especially as soon as we will start keeping a bit more info this could help troubleshooting). And Second, it will be a pain to have to keep some of the information until the end (the processingStartedAt is a first sign of this). And third, we may want to keep some info on say merkle tree creation on all replica participating in the repair, even though we only store the completed time on the node initiating the repair. So I would propose to something like: row key: KS/CF super column name: repair session name (a TimeUUID) columns: the infos on the session (range, start and end time, number of range repaired, bytes transferred, ...) That is roughly the same thing as you propose but with super column name being the repair session name. Now, because the repair session names are TimeUUID (well, right now it is a sting including a UUID, we can change it to a simple TimeUUID easily), the session will be ordered by creation time. So getting the last successful repair is probably not too hard: just grab the last 1000 created sessions and find the last successful one. And if we want, we can even use another specific "index" row that associate 'completion time' -> 'session UUID' (and thanks to the new DynamicCompositeType we can have some rows ordered by TimeUUIDType and some other ordered by LongType without the need of multiple system table). > should expose 'time since last successful repair' for easier aes monitoring > --------------------------------------------------------------------------- > > Key: CASSANDRA-2405 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2405 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Peter Schuller > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 0.8.1 > > Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, > CASSANDRA-2405.patch > > > The practical implementation issues of actually ensuring repair runs is > somewhat of an undocumented/untreated issue. > One hopefully low hanging fruit would be to at least expose the time since > last successful repair for a particular column family, to make it easier to > write a correct script to monitor for lack of repair in a non-buggy fashion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira