[jira] [Commented] (CASSANDRA-5210) DB is randomly and undetectably corrupted during high traffic column family flushes

Jonathan Ellis (JIRA) Thu, 31 Jan 2013 15:43:14 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568245#comment-13568245
 ]


Jonathan Ellis commented on CASSANDRA-5210:
-------------------------------------------

This sounds a lot like a custom comparator that doesn't actually impose a total 
ordering of its data.
                
> DB is randomly and undetectably corrupted during high traffic column family 
> flushes 
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5210
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5210
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8, 
> 0.8.9, 0.8.10, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 
> 1.1.9, 1.2.0, 1.2.1
>         Environment: Cassandra 0.8+, OS/X, java version "1.6.0_37" 
>            Reporter: Elden Bishop
>
> Writes during high traffic column family flushes corrupt the DB and make 
> slice queries return incorrect data.
> Any multi-column write on any version of Cassandra can put the DB in a state 
> where some columns cannot be read alongside other columns.
> eg.
> {{
> // *** for any NON-NULL column (eg. col_a=>AAA)
> cqlsh> SELECT 'col_a' FROM test WHERE KEY='row_a';
>    returns:     'AAA'
> // *** it can disappear when queried alongside another column
> cqlsh> SELECT 'col_a', 'col_b' FROM test WHERE KEY='row_a';
>    returns:      null,   'BBB' // *** col_a is MISSING
> // *** but it depends on the other columns
> cqlsh> SELECT 'col_a', 'col_b', 'col_c' FROM test WHERE KEY='row_a';
>    returns:     'AAA',   'BBB',   'CCC' // *** col_a is BACK
> }}
> Once in this state the database is corrupt and essentially returning random 
> data depending on what columns you query. Single column queries always return 
> correct results so there is no way to verify the data. No errors are logged 
> during corruption and it is impossible to detect without querying all 
> combinations of all columns.
> To reproduce:
> 1. Unzip a distribution of Cassandra and create a test.test column family.
> 2. In a loop alternate between updating either row 'a' or a random row.
>    Write a random value to four random columns (out of 10000). Keep track
>    of all columns set in row 'a'.
> 3. Each pass through the loop query four random columns (out of 10000) from 
> row 'a'. If a column that is known to be set is null, print out the columns 
> that were requested during the query.
> 4. The DB is now corrupt and will return the column if queried by itself but 
> will return null if queried alongside the columns that triggered the error. 
> This is a permanent condition.
> Observations: This bug only manifests directly after a high traffic column 
> family flush occurs in the log. This is a correlation based on simply 
> watching the log. There are no errors or warnings of any kind.
> Workaround: Any multi-column read is potentially invalid and corruption is 
> virtually undetectable. The only workaround is never writing or reading more 
> than a single column in a query.
> I have a simple groovy script that can trigger the error. I have verified the 
> behavior on Cassandra versions as old as 0.8.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5210) DB is randomly and undetectably corrupted during high traffic column family flushes

Reply via email to