[
https://issues.apache.org/jira/browse/CASSANDRA-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Jirsa updated CASSANDRA-12682:
-----------------------------------
Fix Version/s: 4.x
> Silent data corruption and corruption propagation in Cassandra
> --------------------------------------------------------------
>
> Key: CASSANDRA-12682
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12682
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Aishwarya Ganesan
> Labels: Correctness
> Fix For: 4.x
>
>
> Corruptions in Cassandra's SSTable data can be silently returned to users if
> SSTable compression is disabled.
> Cassandra maintains a digest.crc32 and CRC.db in the sstable directory but
> fails to detect the corruptions to SSTable Data.db. Without this, Cassandra
> is vulnerable to silent corruptions resulting from underlying problems in
> disks and file systems atop them. Studies support the need for end to end
> integrity:
> https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
> http://www.cs.toronto.edu/~bianca/papers/fast08.pdf
> In a small test case where the underlying disk/FS corrupts a particular block
> holding the user data, Cassandra can silently return corrupted user data on a
> read request. Also, the read repair or anti-entropy can propagate the
> corrupted data to other intact replicas when the corrupted value is lexically
> greater. This is because a corruption doesn't change the timestamps and
> timestamp conflicts are resolved by choosing the data with the highest value.
> (We reproduced this scenario using our testing framework)
> Why does Cassandra not use the CRC and digests to verify the integrity of
> data in the SStables on read? Are the digest.crc32 and CRC.db files ever used?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]