[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Pavel Yaskevich (JIRA) Fri, 05 Aug 2011 10:15:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080090#comment-13080090
 ]


Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

bq. can't you just implement a no-op compression option that will utilize what 
you're doing / planning to do for compression in terms of block structure and 
block level checksums? Good question. Pavel?

That sounds like a special-casing and it has complications mentioned before - 
more I/O, need to hold up buffer size, won't play nice with mmap. Placing it to 
the block level will harden creation of the tools to process corruption (as 
Jake mentioned) because we think in the "data model" way not in the file block 
way.

First all we should define a goal we pursue by this - which is essential.

If this is only about repair and replication I think that the good way will be 
to checksum at row boundary level which will be: relatively simple to check and 
play nice with mmap.

I still think that the best way to check for corruption will be to use checksum 
at row header (key and row index) and column level even if that introduces disk 
space and CPU overhead (the necessary sacrifice), this could be most elegant 
solution because of few things where two of them could be: introduces no system 
wide complexity (aka special-casing) related to how we work with SSTables and 
repair and allow as think in our data model terms.

But it somehow fills like we are missing better solution in here...



> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Reply via email to