[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Sylvain Lebresne (JIRA) Thu, 04 Aug 2011 03:32:15 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079314#comment-13079314
 ]


Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. checksum at the column level only which will give us better control over 
individual columns and does not seem to be a big overhead

I agree that it is by far the simplest approach for non compressed data, but I, 
for one, am a bit concerned by the overhead: 4 bytes per column is not 
negligible. On some load, that could easily mean a 10-20% data size increase. 
Basically I am concerned about people upgrading to 1.0 and want to make sure 
that upgrading brings no surprise for them (and this even if they don't "trust" 
compression yet, which would be perfectly reasonable). For that to be true, I 
think that if we go with checksum at the column level we would need to make 
that optional and off by default.

bq. Checksum on the compressed block level is unnecessary because bitrot, for 
example, will be detected right on decompression

Not sure that's bulletproof. I don't think all compression algorithm ships with 
a checksum (I don't know about snappy typically). When they don't, it's totally 
possible for bitrot to corrupt compressed data without being a problem at 
decompression nor at deserialization if you're unlucky (granted it is more 
unlikely to go undetected that without compression but it is not good enough). 
So either we check that snappy use checksumming and we only add support for 
algorithm that does, or it is still useful. 

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Reply via email to