[ 
https://issues.apache.org/jira/browse/CASSANDRA-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aishwarya Ganesan updated CASSANDRA-12682:
------------------------------------------
    Description: 
Corruptions in Cassandra's SSTable data can be silently returned to users if 
SSTable compression is disabled. 

Cassandra maintains a digest.crc32 and CRC.db in the sstable directory but 
fails to detect the corruptions to SSTable Data.db. Without this, Cassandra is 
vulnerable to silent corruptions resulting from underlying problems in disks 
and file systems atop them. Studies support the need for end to end integrity:
https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
http://www.cs.toronto.edu/~bianca/papers/fast08.pdf

In a small test case where the underlying disk/FS corrupts a particular block 
holding the user data, Cassandra can silently return corrupted user data on a 
read request. Also, the read repair or anti-entropy can propagate the corrupted 
data to other intact replicas when the corrupted value is lexically greater. 
This is because a corruption doesn't change the timestamps and timestamp 
conflicts are resolved by choosing the data with the highest value. (We 
reproduced this scenario using our testing framework)

Why does Cassandra not use the CRC and digests to verify the integrity of data 
in the SStables on read? Are the digest.crc32 and CRC.db files ever used?

  was:
Corruptions in Cassandra's SSTable data can be silently returned to users if 
SSTable compression is disabled. 

Cassandra maintains a digest.crc32 and CRC.db in the sstable directory but 
fails to detect the corruptions to SSTable Data.db. Without this, Cassandra is 
vulnerable to silent corruptions resulting from underlying problems in disks 
and file systems atop them. Research support the need for end to end integrity:
https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
http://www.cs.toronto.edu/~bianca/papers/fast08.pdf

In a small test case where the underlying disk/FS corrupts a particular block 
holding the user data, Cassandra can silently return corrupted user data on a 
read request. Also, the read repair or anti-entropy can propagate the corrupted 
data to other intact replicas when the corrupted value is lexically greater. 
This is because a corruption doesn't change the timestamps and timestamp 
conflicts are resolved by choosing the data with the highest value. (We 
reproduced this scenario using our testing framework)

Why does Cassandra not use the CRC and digests to verify the integrity of data 
in the SStables on read? Are the digest.crc32 and CRC.db files ever used?


> Silent data corruption and corruption propagation in Cassandra
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-12682
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12682
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Aishwarya Ganesan
>
> Corruptions in Cassandra's SSTable data can be silently returned to users if 
> SSTable compression is disabled. 
> Cassandra maintains a digest.crc32 and CRC.db in the sstable directory but 
> fails to detect the corruptions to SSTable Data.db. Without this, Cassandra 
> is vulnerable to silent corruptions resulting from underlying problems in 
> disks and file systems atop them. Studies support the need for end to end 
> integrity:
> https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
> http://www.cs.toronto.edu/~bianca/papers/fast08.pdf
> In a small test case where the underlying disk/FS corrupts a particular block 
> holding the user data, Cassandra can silently return corrupted user data on a 
> read request. Also, the read repair or anti-entropy can propagate the 
> corrupted data to other intact replicas when the corrupted value is lexically 
> greater. This is because a corruption doesn't change the timestamps and 
> timestamp conflicts are resolved by choosing the data with the highest value. 
> (We reproduced this scenario using our testing framework)
> Why does Cassandra not use the CRC and digests to verify the integrity of 
> data in the SStables on read? Are the digest.crc32 and CRC.db files ever used?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to