[jira] [Comment Edited] (CASSANDRA-9742) Nodetool verify

Jeff Jirsa (JIRA) Tue, 07 Jul 2015 18:18:12 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617790#comment-14617790
 ]


Jeff Jirsa edited comment on CASSANDRA-9742 at 7/8/15 1:17 AM:
---------------------------------------------------------------

Operator perspective, fwiw: I already have repair schedules. I already know 
what needs to be repaired and what doesn't. What I didn't have, previously, was 
a way to validate the files on disk actually matched what I believed they 
matched, short of running scrub.

`verify` was very literally `read only scrub` - when I wrote 5791, I followed 
the scrub code path very closely, because that was the use case I was worried 
about when I wrote it (the concern was bit level corruption due to failing 
HDD/RAID controller - scrub would do the job, but it's a heavy hammer hitting a 
tiny nail). The notion of "verify this node has all the data" was already 
covered by repair, so I never even considered having `verify` do that.

Why not just (add a flag to) enable incremental repair validate checksums for 
all sstables - the verifier will {{mutateRepairedAt(sstable.descriptor, 
ActiveRepairService.UNREPAIRED_SSTABLE)}} on checksum failure which then allows 
incremental repair to re-repair that data?
 



was (Author: jjirsa):
Operator perspective, fwiw: I already have repair schedules. I already know 
what needs to be repaired and what doesn't. What I didn't have, previously, was 
a way to validate the files on disk actually matched what I believed they 
matched, short of running scrub.

`verify` was very literally `read only scrub` - when I wrote 5791, I followed 
the scrub code path very closely, because that was the use case I was worried 
about when I wrote it (the concern was bit level corruption due to failing 
HDD/RAID controller - scrub would do the job, but it's a heavy hammer hitting a 
tiny nail). The notion of "verify this node has all the data" was already 
covered by repair, so I never even considered having `verify` do that.

Why not just have incremental repair validate checksums for all sstables - the 
verifier will {{mutateRepairedAt(sstable.descriptor, 
ActiveRepairService.UNREPAIRED_SSTABLE)}} on checksum failure which then allows 
incremental repair to re-repair that data?
 


> Nodetool verify
> ---------------
>
>                 Key: CASSANDRA-9742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9742
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> We introduced incremental repair in 2.1 but it is difficult to make that the 
> default without unpleasant surprises for incautious users.
> Additionally, while we now store sstable checksums, we leave verification to 
> the user.
> I propose introducing a new command, {{nodetool verify}}, that would address 
> both of these.
> Default operation would be to do an incremental repair, plus validate 
> checksums on *all* sstables (not just unrepaired ones).  We could also have 
> --local mode (checksums only) and --full (classic repair).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9742) Nodetool verify

Reply via email to