Ariel Weisberg created CASSANDRA-9264:
-----------------------------------------
Summary: Cassandra should not persist files without checksums
Key: CASSANDRA-9264
URL: https://issues.apache.org/jira/browse/CASSANDRA-9264
Project: Cassandra
Issue Type: Wish
Reporter: Ariel Weisberg
Fix For: 3.x
Even if checksums aren't validated on the read side every time it is helpful to
have them persisted with checksums so that if a corrupted file is encountered
you can at least validate that the issue is corruption and not an application
level error that generated a corrupt file.
We should standardize on conventions for how to checksum a file and which
checksums to use so we can ensure we get the best performance possible.
For a small checksum I think we should use CRC32 because the hardware support
appears quite good.
For cases where a 4-byte checksum is not enough I think we can look at either
xxhash64 or MurmurHash3.
The problem with xxhash64 is that output is only 8-bytes. The problem with
MurmurHash3 is that the Java implementation is slow. If we can live with
8-bytes and make it easy to switch hash implementations I think xxhash64 is a
good choice because we already ship a good implementation with LZ4.
I would also like to see hashes always prefixed by a type so that we can swap
hashes without running into pain trying to figure out what hash implementation
is present. I would also like to avoid making assumptions about the number of
bytes in a hash field where possible keeping in mind compatibility and space
issues.
Hashing after compression is also desirable over hashing before compression.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)