[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079979#comment-13079979
 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. Block level and column index level are actually the same right? 64kb

True, but that only covers the big indexed rows. If you have lots of tiny rows, 
you have a much bigger overhead. Block level is more consistent and 
predictable. And with column index level, you also need to checksum the row 
header, so it's a slightly greater overhead anyway even for big rows. It's also 
a bit more complicated conceptually, you need to checksum row header and body 
separately and distinguish between indexed and non-indexed rows.

bq. The reason block isn't ideal to me is it makes it much harder to 
recover/support partial reads since the block has no context in the file format

I agree as I mentioned earlier that it is harder. I don't know about the much 
(at least for recovery) however. With the row index, I'm sure it's not too hard 
to only drop the block and maybe a little bit around to get something 
consistent. Yes, it means we will always drop more than with column index 
level, but imho it is not like bitrot happens so often that it matters much 
(but I understand one could disagree).
Also, with column index, you can still have bitrot of the row header, in which 
case the whole row is still screwed.

Anyway, don't get me wrong, I'm not saying that column index is a stupid idea. 
I think however that for some (non exceptional( use cases (small rows, aka, 
probably most of the 'static CF'), the overhead will be much more important 
than with block level. I also think block level is cleaner in that you don't 
have to care about different cases. On the other side, the advantages of the 
column index level are only useful in the exceptional case of bitrot (not the 
case we should optimize for imho), and it is more efficient then but not so 
much.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) 
> unreadable, so the data can be replaced by read repair or anti-entropy.  But 
> if the corruption keeps column data readable we do not detect it, and if it 
> corrupts to a higher timestamp value can even resist being overwritten by 
> newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to