Robert Coli created CASSANDRA-8703:
--------------------------------------

             Summary: incremental repair vs. bitrot
                 Key: CASSANDRA-8703
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8703
             Project: Cassandra
          Issue Type: Bug
            Reporter: Robert Coli


Incremental repair is a great improvement in Cassandra, but it does not contain 
a feature that non-incremental repair does : protection against bitrot.

Scenario :

1) repair SSTable, marking it repaired
2) cosmic ray hits hard drive, corrupting a record in SSTable
3) range is actually unrepaired as of the time that SSTable was repaired, but 
thinks it is repaired

>From my understanding, if bitrot is detected (via eg the CRC on the read path) 
>then all SSTables containing the corrupted range needs to be marked unrepaired 
>on all replicas. Per marcuse@IRC, the naive/simplest response would be to just 
>trigger a full repair in this case.

I am concerned about incremental repair as an operational default while it does 
not handle this case. As an aside, this would also seem to require a new CRC on 
the uncompressed read path, as otherwise one cannot detect the corruption 
without periodic checksumming of SSTables. Alternately, a "nodetool checksum" 
function which verified table checksums, marking ranges unrepaired on failure, 
and which could be run every gc_grace_seconds would seem to meet the 
requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to