[
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055507#comment-13055507
]
Sylvain Lebresne commented on CASSANDRA-2405:
---------------------------------------------
I'm sorry but I still think we are still returning the wrong number to the
user. To be clear, this is nothing against the code of the patch itself, I just
think that given the way repair works, it is not so simple to have a "time
since last successful repair".
The "unit" of a repair is for a given keyspace, column family and range.
Because of that, I don't think we can return a single "time since last
successful repair" for a given keyspace and column family. It has to include
the range somehow. Granted, so far a nodetool repair repairs all the ranges of
the node you launch it on, but I don't think this should be the case
(CASSANDRA-2610). Moreover, even now, one of the range can fail without the
other. So returning only one number for all ranges is wrong.
The other problem is: I'm not convinced that recording the information only on
the node coordinating the repair is necessarily super helpful. When you start a
repair a node, you will also repair its neighbor (for only the range they
share), so recording the time only on the initial node on which the nodetool
command was connected is random, and will convey the idea that repair should be
started for every range on every node (while I strongly thing that the short
term goal should be to make it easy to NOT do that -- CASSANDRA-2610 again).
Imho, we should hold back on this issue for now and at least wait for
CASSANDRA-2610, CASSANDRA-2606 and CASSANDRA-2816 before committing to
anything. I agree that having information to help people plan repair is nice,
but it is at most a very minor improvement and exposing a misleading number is
more harmful that no number.
> should expose 'time since last successful repair' for easier aes monitoring
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-2405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Peter Schuller
> Assignee: Pavel Yaskevich
> Priority: Minor
> Fix For: 0.8.2
>
> Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch,
> CASSANDRA-2405-v4.patch, CASSANDRA-2405.patch
>
>
> The practical implementation issues of actually ensuring repair runs is
> somewhat of an undocumented/untreated issue.
> One hopefully low hanging fruit would be to at least expose the time since
> last successful repair for a particular column family, to make it easier to
> write a correct script to monitor for lack of repair in a non-buggy fashion.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira