[ 
https://issues.apache.org/jira/browse/CASSANDRA-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000955#comment-15000955
 ] 

Ariel Weisberg commented on CASSANDRA-9947:
-------------------------------------------

The fact that we never validate checksums on uncompressed data on reads creates 
problems for repair even before verify is run. We can propagate corrupted data 
because the merkle tree is going to detect the corruption and attempt to 
propagate it without validating the checksum of the corrupt data.

Right now scrub isn't going to validate checksums on uncompressed files, based 
on my reading, so scrubbing won't improve the situation.. I also don't see how 
scrub can fix a corrupted compressed tables since the checksum is not per 
record. It's going to be an arbitrary 64k page. You could try and parse the 
page anyways, but that is not what is currently done since the reader will just 
throw an exception if you try. Corrupted sstables work fine in the regular path 
because the index points you do a valid place to start reading from, but that 
won't work for a sequential walk through the file.

It seems to me like we are shuffling deck chairs on the titanic once we allow 
repair to propagate corrupted data. You could say the same about returning 
corrupted data to user queries since those can be used to propagate the 
corruption back into C* at all replicas.

If there are flows of handling corruption we want to have it might make sense 
to create some test cases for the various file formats and see what the 
existing code actually does. My suspicion is that sequential access is going to 
fail in the compressed compressed stuff and blindly succeed in uncompressed 
case. 

We also need to nail down fix versions since coalescing to something that works 
might not be possible/worthwhile against existing formats. And while we are at 
it maybe we should nail down file formats we are more happy with in terms of 
being flexible about block sizes, implementing a page cache etc.

> nodetool verify is broken
> -------------------------
>
>                 Key: CASSANDRA-9947
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9947
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Priority: Critical
>             Fix For: 2.2.4
>
>
> Raised these issues on CASSANDRA-5791, but didn't revert/re-open, so they 
> were ignored:
> We mark sstables that fail verification as unrepaired, but that's not going 
> to do what you think.  What it means is that the local node will use that 
> sstable in the next repair, but other nodes will not. So all we'll end up 
> doing is streaming whatever data we can read from it, to the other replicas.  
> If we could magically mark whatever sstables correspond on the remote nodes, 
> to the data in the local sstable, that would work, but we can't.
> IMO what we should do is:
> *    scrub, because it's quite likely we'll fail reading from the sstable 
> otherwise and
> *    full repair across the data range covered by the sstable
> Additionally,
> * I'm not sure that keeping "extended verify" code around is worth it. Since 
> the point is to work around not having a checksum, we could just scrub 
> instead. This is slightly more heavyweight but it would be a one-time cost 
> (scrub would build a new checksum) and we wouldn't have to worry about 
> keeping two versions of almost-the-same-code in sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to