[ 
https://issues.apache.org/jira/browse/CASSANDRA-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283140#comment-15283140
 ] 

Joel Knighton commented on CASSANDRA-11742:
-------------------------------------------

I've confirmed this issue - there's a window between 
{{SystemKeyspace.finishStartup()}} and calling {{Gossiper.instance.start()}} in 
{{prepareToJoin}} where the contents will only be in the memtable/commitlog and 
not flushed.

There doesn't seem a clearly better way to refactor 
{{SystemKeyspace.checkHealth()}} - ideally, we wouldn't write to {{local}} 
before {{finishStartup}}, but that would be a significant refactor without a 
very significant reward. Even with your proposed fix, there's a window where we 
could crash before we even attempt to write in 
{[SystemKeyspace.finishStartup()}}, but I think that's livable.

As an alternative to your patch, [~tommy_s], how would you feel about just 
forcing a blocking flush in {{persistLocalMetadata}}? This would ensure the 
data is present even if we have a hard crash/kill circumstance where 
{{StorageServiceShutdownHook}} doesn't run.

> Failed bootstrap results in exception when node is restarted
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-11742
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11742
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tommy Stendahl
>            Assignee: Tommy Stendahl
>            Priority: Minor
>         Attachments: 11742.txt
>
>
> Since 2.2 a failed bootstrap results in a 
> {{org.apache.cassandra.exceptions.ConfigurationException: Found system 
> keyspace files, but they couldn't be loaded!}} exception when the node is 
> restarted. This did not happen in 2.1, it just tried to bootstrap again. I 
> know that the workaround is relatively easy, just delete the system keyspace 
> in the data folder on disk and try again, but its a bit annoying that you 
> have to do that.
> The problem seems to be that the creation of the {{system.local}} table has 
> been moved to just before the bootstrap begins (in 2.1 it was done much 
> earlier) and as a result its still in the memtable och commitlog if the 
> bootstrap failes. Still a few values is inserted to the {{system.local}} 
> table at an earlier point in the startup and they have been flushed from the 
> memtable to an sstable. When the node is restarted the 
> {{SystemKeyspace.checkHealth()}} is executed before the commitlog is replayed 
> and therefore only see the sstable with an incomplete {{system.local}} table 
> and throws an exception.
> I think we could fix this very easily by forceFlush the system keyspace in 
> the {{StorageServiceShutdownHook}}, I have included a patch that does this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to