[ 
https://issues.apache.org/jira/browse/CASSANDRA-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710821#comment-14710821
 ] 

Stefania commented on CASSANDRA-10159:
--------------------------------------

I've attached the log file of the node that caused the problem on Jenkins. In 
this specific case, it failed to complete the transaction when restarting 
because of checking the last update time even when no files exist, which is 
what we fixed here. This highlights another problem however, if for any reason 
we cannot complete a transaction, chances are we won't be able to list 
temporary files for this table either, regardless of the number of attempts. So 
my idea to increase MAX_ATTEMPTS is actually not required. This observation 
increases the importance of CASSANDRA-10112. If we decide to carry on, corrupt 
log files should be stashed or removed.

> Incorrect last update time causes dtest to fail due to unexpected errors
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10159
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10159
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.0.0 rc1
>
>         Attachments: node2.log
>
>
> Some dtests are failing as follows:
> http://cassci.datastax.com/job/cassandra-3.0_dtest/96/testReport/counter_tests/TestCounters/upgrade_test/
> {code}
> Unexpected error in node2 node log: ['ERROR [main] 2015-08-23 11:25:52,701 
> TransactionLog.java:246 - Possible disk corruption detected for sstable 
> [ma-2-big], record [REMOVE:[ma-2-big,1440329048000,8]]: last update time [Thu 
> Jan 01 00:00:00 UTC 1970] should have been [Sun Aug 23 11:24:08 UTC 2015] 
> ERROR [main] 2015-08-23 11:25:52,709 TransactionLog.java:992 - Possible disk 
> corruption: failed to read transaction log 
> /mnt/tmp/dtest-E0OvQC/test/node2/data/system/local-7ad54392bcdd35a684174e047860b377/ma_txn_compaction_90eda9f0-4989-11e5-86bd-f32569933441.log
>  
> org.apache.cassandra.db.lifecycle.TransactionLog$CorruptTransactionLogException:
>  Failed to verify transaction 90eda9f0-4989-11e5-86bd-f32569933441 record 
> [REMOVE:[ma-2-big,1440329048000,8]]: possible disk corruption, aborting \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog$TransactionFile.readRecords(TransactionLog.java:349)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog$TransactionData.readLogFile(TransactionLog.java:574)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog.removeUnfinishedLeftovers(TransactionLog.java:988)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:548)
>  [main/:na] \tat 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:584)
>  [main/:na] \tat 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> [main/:na] \tat 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> [main/:na] \tat 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:166) 
> [main/:na] \tat 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na] \tat 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) 
> [main/:na] ERROR [main] 2015-08-23 11:25:52,710 TransactionLog.java:998 - 
> Failed to remove unfinished transaction leftovers 
> org.apache.cassandra.db.lifecycle.TransactionLog$CorruptTransactionLogException:
>  Failed to verify transaction 90eda9f0-4989-11e5-86bd-f32569933441 record 
> [REMOVE:[ma-2-big,1440329048000,8]]: possible disk corruption, aborting \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog$TransactionFile.readRecords(TransactionLog.java:349)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog$TransactionData.readLogFile(TransactionLog.java:574)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.TransactionLog.removeUnfinishedLeftovers(TransactionLog.java:988)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.removeUnfinishedLeftovers(LifecycleTransaction.java:548)
>  [main/:na] \tat 
> org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:584)
>  [main/:na] \tat 
> org.apache.cassandra.service.StartupChecks$7.execute(StartupChecks.java:274) 
> [main/:na] \tat 
> org.apache.cassandra.service.StartupChecks.verify(StartupChecks.java:103) 
> [main/:na] \tat 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:166) 
> [main/:na] \tat 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na] \tat 
> {code}
> My best guess is that before reading the update time we should check that the 
> file actually exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to