[
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080090#comment-13080090
]
Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------
bq. can't you just implement a no-op compression option that will utilize what
you're doing / planning to do for compression in terms of block structure and
block level checksums? Good question. Pavel?
That sounds like a special-casing and it has complications mentioned before -
more I/O, need to hold up buffer size, won't play nice with mmap. Placing it to
the block level will harden creation of the tools to process corruption (as
Jake mentioned) because we think in the "data model" way not in the file block
way.
First all we should define a goal we pursue by this - which is essential.
If this is only about repair and replication I think that the good way will be
to checksum at row boundary level which will be: relatively simple to check and
play nice with mmap.
I still think that the best way to check for corruption will be to use checksum
at row header (key and row index) and column level even if that introduces disk
space and CPU overhead (the necessary sacrifice), this could be most elegant
solution because of few things where two of them could be: introduces no system
wide complexity (aka special-casing) related to how we work with SSTables and
repair and allow as think in our data model terms.
But it somehow fills like we are missing better solution in here...
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row)
> unreadable, so the data can be replaced by read repair or anti-entropy. But
> if the corruption keeps column data readable we do not detect it, and if it
> corrupts to a higher timestamp value can even resist being overwritten by
> newer values.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira