[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079314#comment-13079314 ]
Sylvain Lebresne commented on CASSANDRA-1717: --------------------------------------------- bq. checksum at the column level only which will give us better control over individual columns and does not seem to be a big overhead I agree that it is by far the simplest approach for non compressed data, but I, for one, am a bit concerned by the overhead: 4 bytes per column is not negligible. On some load, that could easily mean a 10-20% data size increase. Basically I am concerned about people upgrading to 1.0 and want to make sure that upgrading brings no surprise for them (and this even if they don't "trust" compression yet, which would be perfectly reasonable). For that to be true, I think that if we go with checksum at the column level we would need to make that optional and off by default. bq. Checksum on the compressed block level is unnecessary because bitrot, for example, will be detected right on decompression Not sure that's bulletproof. I don't think all compression algorithm ships with a checksum (I don't know about snappy typically). When they don't, it's totally possible for bitrot to corrupt compressed data without being a problem at decompression nor at deserialization if you're unlucky (granted it is more unlikely to go undetected that without compression but it is not good enough). So either we check that snappy use checksumming and we only add support for algorithm that does, or it is still useful. > Cassandra cannot detect corrupt-but-readable column data > -------------------------------------------------------- > > Key: CASSANDRA-1717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1717 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Fix For: 1.0 > > Attachments: checksums.txt > > > Most corruptions of on-disk data due to bitrot render the column (or row) > unreadable, so the data can be replaced by read repair or anti-entropy. But > if the corruption keeps column data readable we do not detect it, and if it > corrupts to a higher timestamp value can even resist being overwritten by > newer values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira