I can see this as a strong improvement in Cassandra management and support
it.

+1 non binding

On Mon, Dec 11, 2023 at 8:28 PM Raymond Huffman <raymondmhuff...@gmail.com>
wrote:

> Hello All,
>
> On our fork of Cassandra, we've implemented some custom behavior for
> handling CommitLog and SSTable Corruption errors. Specifically, if a node
> detects one of those errors, we want the node to stop itself, and if the
> node is restarted, we want initialization to fail. This is useful in
> Kubernetes when you expect nodes to be restarted frequently and makes our
> corruption remediation workflows less error-prone. I think we could make
> this behavior more pluggable by allowing users to provide custom
> implementations of the FSErrorHandler, and the error handler that's
> currently implemented at
> org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in
> the same way one can provide custom Partitioners and
> Authenticators/Authorizers.
>
> Would you take as a contribution one of the following?
> 1. user provided implementations of FSErrorHandler and
> CommitLogErrorHandler, set via config; and/or
> 2. new commit failure and disk failure policies that write a poison pill
> file to disk and fail on startup if that file exists
>
> The poison pill implementation is what we currently use - we call this a
> "Non Transient Error" and we want these states to always require manual
> intervention to resolve, including manual action to clear the error. I'd be
> happy to contribute this if other users would find it beneficial. I had
> initially shared this question in Slack, but I'm now sharing it here for
> broader visibility.
>
> -Raymond Huffman
>

Reply via email to