[
https://issues.apache.org/jira/browse/CASSANDRA-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620509#comment-14620509
]
Branimir Lambov commented on CASSANDRA-9749:
--------------------------------------------
There are several more things to consider here:
- In a node powered off while flushing section to disk there will be a read or
decompression error. This will be a very normal, frequently occurring
situation. Do we stop/die for it?
- Supposing there's bit rot and the operator does want to recover data in the
other log sections in the same segment, he is now supposed to change commit log
policy to "ignore" and boot up the cluster with that setting. Do we want to run
the (quite substantial) risk the operator will not restart it again and the
node stays in unintended "ignore" commit log policy?
- As mentioned in CASSANDRA-7125, how do we know a table is unknown rather than
dropped?
> CommitLogReplayer continues startup after encountering errors
> -------------------------------------------------------------
>
> Key: CASSANDRA-9749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9749
> Project: Cassandra
> Issue Type: Bug
> Reporter: Blake Eggleston
> Assignee: Branimir Lambov
> Fix For: 2.2.0 rc2
>
>
> There are a few places where the commit log recovery method either skips
> sections or just returns when it encounters errors.
> Specifically if it can't read the header here:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L298
> Or if there are compressor problems here:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L314
> and here:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L366
> Whether these are user-fixable or not, I think we should require more direct
> user intervention (ie: fix what's wrong, or remove the bad file and restart)
> since we're basically losing data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)