[ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079616#comment-13079616 ]
Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/4/11 9:39 PM: -------------------------------------------------------------------- This is a good idea but it has few complications: - buffer length should be stored in order to be used by reader - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O... was (Author: xedin): This is a good idea but it has few complications: - buffer length should be store in order to be used by reader - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O... > Cassandra cannot detect corrupt-but-readable column data > -------------------------------------------------------- > > Key: CASSANDRA-1717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1717 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Fix For: 1.0 > > Attachments: checksums.txt > > > Most corruptions of on-disk data due to bitrot render the column (or row) > unreadable, so the data can be replaced by read repair or anti-entropy. But > if the corruption keeps column data readable we do not detect it, and if it > corrupts to a higher timestamp value can even resist being overwritten by > newer values. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira