[
https://issues.apache.org/jira/browse/CASSANDRA-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568245#comment-13568245
]
Jonathan Ellis commented on CASSANDRA-5210:
-------------------------------------------
This sounds a lot like a custom comparator that doesn't actually impose a total
ordering of its data.
> DB is randomly and undetectably corrupted during high traffic column family
> flushes
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-5210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5210
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8,
> 0.8.9, 0.8.10, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8,
> 1.1.9, 1.2.0, 1.2.1
> Environment: Cassandra 0.8+, OS/X, java version "1.6.0_37"
> Reporter: Elden Bishop
>
> Writes during high traffic column family flushes corrupt the DB and make
> slice queries return incorrect data.
> Any multi-column write on any version of Cassandra can put the DB in a state
> where some columns cannot be read alongside other columns.
> eg.
> {{
> // *** for any NON-NULL column (eg. col_a=>AAA)
> cqlsh> SELECT 'col_a' FROM test WHERE KEY='row_a';
> returns: 'AAA'
> // *** it can disappear when queried alongside another column
> cqlsh> SELECT 'col_a', 'col_b' FROM test WHERE KEY='row_a';
> returns: null, 'BBB' // *** col_a is MISSING
> // *** but it depends on the other columns
> cqlsh> SELECT 'col_a', 'col_b', 'col_c' FROM test WHERE KEY='row_a';
> returns: 'AAA', 'BBB', 'CCC' // *** col_a is BACK
> }}
> Once in this state the database is corrupt and essentially returning random
> data depending on what columns you query. Single column queries always return
> correct results so there is no way to verify the data. No errors are logged
> during corruption and it is impossible to detect without querying all
> combinations of all columns.
> To reproduce:
> 1. Unzip a distribution of Cassandra and create a test.test column family.
> 2. In a loop alternate between updating either row 'a' or a random row.
> Write a random value to four random columns (out of 10000). Keep track
> of all columns set in row 'a'.
> 3. Each pass through the loop query four random columns (out of 10000) from
> row 'a'. If a column that is known to be set is null, print out the columns
> that were requested during the query.
> 4. The DB is now corrupt and will return the column if queried by itself but
> will return null if queried alongside the columns that triggered the error.
> This is a permanent condition.
> Observations: This bug only manifests directly after a high traffic column
> family flush occurs in the log. This is a correlation based on simply
> watching the log. There are no errors or warnings of any kind.
> Workaround: Any multi-column read is potentially invalid and corruption is
> virtually undetectable. The only workaround is never writing or reading more
> than a single column in a query.
> I have a simple groovy script that can trigger the error. I have verified the
> behavior on Cassandra versions as old as 0.8.1
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira