[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055507#comment-13055507
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
---------------------------------------------

I'm sorry but I still think we are still returning the wrong number to the 
user. To be clear, this is nothing against the code of the patch itself, I just 
think that given the way repair works, it is not so simple to have a "time 
since last successful repair".

The "unit" of a repair is for a given keyspace, column family and range. 
Because of that, I don't think we can return a single "time since last 
successful repair" for a given keyspace and column family. It has to include 
the range somehow. Granted, so far a nodetool repair repairs all the ranges of 
the node you launch it on, but I don't think this should be the case 
(CASSANDRA-2610). Moreover, even now, one of the range can fail without the 
other. So returning only one number for all ranges is wrong.

The other problem is: I'm not convinced that recording the information only on 
the node coordinating the repair is necessarily super helpful. When you start a 
repair a node, you will also repair its neighbor (for only the range they 
share), so recording the time only on the initial node on which the nodetool 
command was connected is random, and will convey the idea that repair should be 
started for every range on every node (while I strongly thing that the short 
term goal should be to make it easy to NOT do that -- CASSANDRA-2610 again).

Imho, we should hold back on this issue for now and at least wait for 
CASSANDRA-2610, CASSANDRA-2606 and CASSANDRA-2816 before committing to 
anything. I agree that having information to help people plan repair is nice, 
but it is at most a very minor improvement and exposing a misleading number is 
more harmful that no number.


> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 0.8.2
>
>         Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
> CASSANDRA-2405-v4.patch, CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is 
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since 
> last successful repair for a particular column family, to make it easier to 
> write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to