[
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283140#comment-15283140
]
Joel Knighton commented on CASSANDRA-11742:
-------------------------------------------
I've confirmed this issue - there's a window between
{{SystemKeyspace.finishStartup()}} and calling {{Gossiper.instance.start()}} in
{{prepareToJoin}} where the contents will only be in the memtable/commitlog and
not flushed.
There doesn't seem a clearly better way to refactor
{{SystemKeyspace.checkHealth()}} - ideally, we wouldn't write to {{local}}
before {{finishStartup}}, but that would be a significant refactor without a
very significant reward. Even with your proposed fix, there's a window where we
could crash before we even attempt to write in
{[SystemKeyspace.finishStartup()}}, but I think that's livable.
As an alternative to your patch, [~tommy_s], how would you feel about just
forcing a blocking flush in {{persistLocalMetadata}}? This would ensure the
data is present even if we have a hard crash/kill circumstance where
{{StorageServiceShutdownHook}} doesn't run.
> Failed bootstrap results in exception when node is restarted
> ------------------------------------------------------------
>
> Key: CASSANDRA-11742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
> Project: Cassandra
> Issue Type: Bug
> Reporter: Tommy Stendahl
> Assignee: Tommy Stendahl
> Priority: Minor
> Attachments: 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system
> keyspace files, but they couldn't be loaded!}} exception when the node is
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I
> know that the workaround is relatively easy, just delete the system keyspace
> in the data folder on disk and try again, but its a bit annoying that you
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has
> been moved to just before the bootstrap begins (in 2.1 it was done much
> earlier) and as a result its still in the memtable och commitlog if the
> bootstrap failes. Still a few values is inserted to the {{system.local}}
> table at an earlier point in the startup and they have been flushed from the
> memtable to an sstable. When the node is restarted the
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed
> and therefore only see the sstable with an incomplete {{system.local}} table
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in
> the {{StorageServiceShutdownHook}}, I have included a patch that does this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)