[
https://issues.apache.org/jira/browse/CASSANDRA-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974376#comment-14974376
]
Fred A edited comment on CASSANDRA-10501 at 10/26/15 3:29 PM:
--------------------------------------------------------------
Will this one be included in 2.1.12?
was (Author: fredderf):
Will this one be included in 1.1.12?
> Failure to start up Cassandra when temporary compaction files are not all
> renamed after kill/crash (FSReadError)
> ----------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-10501
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10501
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Cassandra 2.1.6
> Redhat Linux
> Reporter: Mathieu Roy
> Assignee: Marcus Eriksson
> Labels: compaction, triage
> Fix For: 2.1.x, 2.2.x, 3.0.0
>
>
> We have seen an issue intermittently but repeatedly over the last few months
> where, after exiting the Cassandra process, it fails to start with an
> FSReadError (stack trace below). The FSReadError refers to a 'statistics'
> file for a that doesn't exist, though a corresponding temporary file does
> exist (eg. there is no
> /media/data/cassandraDB/data/clusteradmin/singleton_token-01a92ed069b511e59b2c53679a538c14/clusteradmin-singleton_token-ka-9-Statistics.db
> file, but there is a
> /media/data/cassandraDB/data/clusteradmin/singleton_token-01a92ed069b511e59b2c53679a538c14/clusteradmin-singleton_token-tmp-ka-9-Statistics.db
> file.)
> We tracked down the issue to the fact that the process exited with leftover
> compactions and some of the 'tmp' files for the SSTable had been renamed to
> final files, but not all of them - the issue happens if the 'Statistics' file
> is not renamed but others are. The scenario we've seen on the last two
> occurrences involves the 'CompressionInfo' file being a final file while all
> other files for the SSTable generation were left with 'tmp' names.
> When this occurs, Cassandra cannot start until the file issue is resolved;
> we've worked around it by deleting the SSTable files from the same
> generation, both final and tmp, which at least allows Cassandra to start.
> Renaming all files to either tmp or final names would also work.
> We've done some debugging in Cassandra and have been unable to cause the
> issue without renaming the files manually. The rename code at
> SSTableWriter.rename() looks like it could result in this if the process
> exits in the middle of the rename, but in every occurrence we've debugged
> through, the Set of components is ordered and Statistics is the first file
> renamed.
> However the comments in SSTableWriter.rename() suggest that the 'Data' file
> is meant to be used as meaning the files were completely renamed. The method
> ColumnFamilyStore. removeUnfinishedCompactionLeftovers(), however, will
> proceed assuming the compaction is complete if any of the component files has
> a final name, and will skip temporary files when reading the list. If the
> 'Statistics' file is temporary then it won't be read, and the defaults does
> not include a list of ancestors, leading to the NullPointerException.
> It appears that ColumnFamilyStore. removeUnfinishedCompactionLeftovers()
> should perhaps either ensure that all 'tmp' files are properly renamed before
> it uses them, or skip SSTable files that don't have either the 'Data' or
> 'Statistics' file in final form.
> Stack trace:
> {code}
> FSReadError in Failed to remove unfinished compaction leftovers (file:
> /media/data/cassandraDB/data/clusteradmin/singleton_token-01a92ed069b511e59b2c53679a538c14/clusteradmin-singleton_token-ka-9-Statistics.db).
> See log for details.
> at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:617)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
> Caused by: java.lang.NullPointerException
> at
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:609)
> ... 3 more
> Exception encountered during startup: java.lang.NullPointerException
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)