[
https://issues.apache.org/jira/browse/CASSANDRA-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229849#comment-15229849
]
Stefania commented on CASSANDRA-11470:
--------------------------------------
I fixed a problem in {{LogAwareFileLister.classifyFiles}}, which would have
affected transactions spread across multiple disks. However, this is not what
is causing these tests to fail unless we are using multiple disks and I haven't
noticed.
I've also fixed another problem with the existing logic, in that we should only
check that OLD sstable files still exist to determine if there is a race, we
don't need to look at NEW sstable files. Since we cannot tell in advance how
many NEW sstables files will be created and when, this might explain the
problems we are seeing although I would have expected them on 3.0 as well.
Finally, if we fail to classify files due to an inconsistent disk state, I've
added a log message showing the list of files detected and the content of the
transaction.
Patch and CI results for trunk:
|[patch|https://github.com/stef1927/cassandra/commits/11470]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11470-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-11470-dtest/]|
Depending on the CI results, we may need to run {{base_replica_repair_test}}
multiple times to see if the problem is fixed and to gather more information if
it isn't.
> dtest failure in
> materialized_views_test.TestMaterializedViews.base_replica_repair_test
> ---------------------------------------------------------------------------------------
>
> Key: CASSANDRA-11470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11470
> Project: Cassandra
> Issue Type: Bug
> Reporter: Philip Thompson
> Assignee: Stefania
> Labels: dtest
> Fix For: 3.x
>
> Attachments: node1.log, node2.log, node2_debug.log, node3.log,
> node3_debug.log
>
>
> base_replica_repair_test has failed on trunk with the following exception in
> the log of node2:
> {code}
> ERROR [main] 2016-03-31 08:48:46,949 CassandraDaemon.java:708 - Exception
> encountered during startup
> java.lang.RuntimeException: Failed to list files in
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
> ~[main/:na]
> at
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
> ~[main/:na]
> at
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:725)
> ~[main/:na]
> at
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:690)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:567)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:555)
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:383)
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:320)
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:130)
> ~[main/:na]
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:107)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.restrictions.StatementRestrictions.<init>(StatementRestrictions.java:139)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:864)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:811)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:799)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:505)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:242)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.QueryProcessor.prepareInternal(QueryProcessor.java:286)
> ~[main/:na]
> at
> org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:294)
> ~[main/:na]
> at
> org.apache.cassandra.schema.SchemaKeyspace.query(SchemaKeyspace.java:1246)
> ~[main/:na]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:875)
> ~[main/:na]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:867)
> ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:134)
> ~[main/:na]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:124)
> ~[main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:229)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691)
> [main/:na]
> Caused by: java.lang.RuntimeException: Failed to list directory files in
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985,
> inconsistent disk state for transaction
> [ma_txn_flush_58db56b0-f71d-11e5-bf68-03a01adb9f11.log in
> /mnt/tmp/dtest-du964e/test/node2/data0/system_schema/views-9786ac1cdd583201a7cdad556410c985]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:149)
> ~[main/:na]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:103)
> ~[main/:na]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister$$Lambda$48/35984028.accept(Unknown
> Source) ~[na:na]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> ~[na:1.8.0_45]
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> ~[na:1.8.0_45]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> ~[na:1.8.0_45]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
> ~[na:1.8.0_45]
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
> ~[na:1.8.0_45]
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> ~[na:1.8.0_45]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> ~[na:1.8.0_45]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> ~[na:1.8.0_45]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> ~[na:1.8.0_45]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:71)
> ~[main/:na]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:49)
> ~[main/:na]
> ... 25 common frames omitted
> {code}
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1092/testReport/materialized_views_test/TestMaterializedViews/base_replica_repair_test
> Failed on CassCI build trunk_dtest #1092
> I've attached the logs from the failure in build 1092.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)