[
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079979#comment-13079979
]
Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------
bq. Block level and column index level are actually the same right? 64kb
True, but that only covers the big indexed rows. If you have lots of tiny rows,
you have a much bigger overhead. Block level is more consistent and
predictable. And with column index level, you also need to checksum the row
header, so it's a slightly greater overhead anyway even for big rows. It's also
a bit more complicated conceptually, you need to checksum row header and body
separately and distinguish between indexed and non-indexed rows.
bq. The reason block isn't ideal to me is it makes it much harder to
recover/support partial reads since the block has no context in the file format
I agree as I mentioned earlier that it is harder. I don't know about the much
(at least for recovery) however. With the row index, I'm sure it's not too hard
to only drop the block and maybe a little bit around to get something
consistent. Yes, it means we will always drop more than with column index
level, but imho it is not like bitrot happens so often that it matters much
(but I understand one could disagree).
Also, with column index, you can still have bitrot of the row header, in which
case the whole row is still screwed.
Anyway, don't get me wrong, I'm not saying that column index is a stupid idea.
I think however that for some (non exceptional( use cases (small rows, aka,
probably most of the 'static CF'), the overhead will be much more important
than with block level. I also think block level is cleaner in that you don't
have to care about different cases. On the other side, the advantages of the
column index level are only useful in the exceptional case of bitrot (not the
case we should optimize for imho), and it is more efficient then but not so
much.
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
> Key: CASSANDRA-1717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Pavel Yaskevich
> Fix For: 1.0
>
> Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row)
> unreadable, so the data can be replaced by read repair or anti-entropy. But
> if the corruption keeps column data readable we do not detect it, and if it
> corrupts to a higher timestamp value can even resist being overwritten by
> newer values.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira