[
https://issues.apache.org/jira/browse/CASSANDRA-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefania updated CASSANDRA-11889:
---------------------------------
Resolution: Fixed
Fix Version/s: (was: 3.0.x)
(was: 3.x)
3.10
3.0.9
Status: Resolved (was: Patch Available)
> LogRecord: file system race condition may cause verify() to fail
> ----------------------------------------------------------------
>
> Key: CASSANDRA-11889
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11889
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 3.0.9, 3.10
>
>
> The following exception was reported in CASSANDRA-11470. It occurred whilst
> listing files with compaction in progress:
> {code}
> WARN [CompactionExecutor:2006] 2016-05-23 18:23:31,694
> BigTableWriter.java:171 - Writing large partition
> test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6
> (123663388 bytes)
> INFO [IndexSummaryManager:1] 2016-05-23 18:24:23,731
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> WARN [CompactionExecutor:2006] 2016-05-23 18:24:56,669
> BigTableWriter.java:171 - Writing large partition
> test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9
> (256286063 bytes)
> WARN [CompactionExecutor:2006] 2016-05-23 18:26:23,575
> BigTableWriter.java:171 - Writing large partition
> test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7
> (212445557 bytes)
> INFO [CompactionExecutor:2005] 2016-05-23 18:29:26,839
> LeveledManifest.java:437 - Adding high-level (L3)
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
> to candidates
> WARN [CompactionExecutor:2006] 2016-05-23 18:30:34,154
> BigTableWriter.java:171 - Writing large partition
> test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec
> (183852539 bytes)
> INFO [CompactionExecutor:2006] 2016-05-23 18:31:21,080
> LeveledManifest.java:437 - Adding high-level (L3)
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
> to candidates
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207
> LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big],
> record
> [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
> last update time [00:00:00] should have been [08:29:36]
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208
> ScheduledReporter.java:119 - RuntimeException thrown from
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in
> /data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_91]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by:
> org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
> Some records failed verification. See earlier in log for details.
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> ~[na:1.8.0_91]
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> ~[na:1.8.0_91]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
> ~[na:1.8.0_91]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> ~[na:1.8.0_91]
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> ~[na:1.8.0_91]
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> ~[na:1.8.0_91]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> ~[na:1.8.0_91]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> ~[na:1.8.0_91]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> ~[na:1.8.0_91]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> ... 22 common frames omitted
> INFO [CompactionExecutor:2005] 2016-05-23 18:32:27,284
> LeveledManifest.java:437 - Adding high-level (L3)
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
> to candidates
> INFO [CompactionExecutor:2005] 2016-05-23 18:34:42,562
> LeveledManifest.java:437 - Adding high-level (L3)
> BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
> to candidates
> {code}
> The error of interest is:
> {code}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207
> LogFile.java:173 - Unexpected files detected for sstable [ma-91034-big],
> record
> [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
> last update time [00:00:00] should have been [08:29:36]
> {code}
> The problem is this code here in LogRecord:
> {code}
> public LogRecord withExistingFiles()
> {
> return make(type, getExistingFiles(), 0, absolutePath.get());
> }
> public static LogRecord make(Type type, List<File> files, int minFiles,
> String absolutePath)
> {
> long lastModified = files.stream().map(File::lastModified).reduce(0L,
> Long::max);
> return new LogRecord(type, absolutePath, lastModified,
> Math.max(minFiles, files.size()));
> }
> {code}
> If the sstable files are completely deleted after getExistingFiles() has
> returned them and before lastModified is calculated, then we may report last
> update time as zero because the files no longer exist and still indicate that
> there are files on disk so that this code in LogFile.verifyRecord fails:
> {code}
> record.status.onDiskRecord = record.withExistingFiles();
> if (record.updateTime != record.status.onDiskRecord.updateTime &&
> record.status.onDiskRecord.numFiles > 0)
> {
> record.setError(String.format("Unexpected files detected for
> sstable [%s], " +
> "record [%s]: last update time
> [%tT] should have been [%tT]",
> record.fileName(),
> record,
>
> record.status.onDiskRecord.updateTime,
> record.updateTime));
> }
> {code}
> As a consequence, we throw an exception when listing files. In this case we
> were calculating the true snapshot size for the metrics.
> Note that we are careful to delete files from oldest to newest so that the
> last update time should always match what is stored in the txn log, however
> if all files are deleted then we report last update time as zero. So, if my
> analysis is correct, this problem should not be very frequent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)