Thanks for circling back and posting your experience!
>
Hi all,
I wanted to echo back on this thread a bit of a "win". In investigating
ways to mitigate the "corruption on hard shutdown" issue, we came across
the Group Commitlog feature that was added in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-13530). We backported and
enabled this
Following up, I've found that we tend to encounter one of three types of
exceptions/commitlog corruptions:
1.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at ... in CommitLog-5-1531150627243.log
at
Thanks for the links/comments Jeff and Bowen.
We run xfs. Not sure that we can switch to zfs, so a different solution
would be preferred.
I’ll take a look through that patch – maybe I’ll try to backport and
replicate. We’ve seen both cases where the commitlog is just 0s (empty)
and where it has
The commitlog code has changed DRASTICALLY between 2.x and trunk.
If it's really a bunch of trailing 0s as was suggested later, then
https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least
one cause/case of that particular bug.
On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky
I have seen the same error in Cassandra 3.x too, and in fact quite a few
times. On a few occasions, I opened the corrupted commit log file in a
hex editor, and it was filled with a lots of 0x00s. I believe it was
caused by the combination of the way Cassandra flushes the commit log +
the way
And for completeness, a sample stack trace:
ERROR [2021-07-21T02:11:01.994Z]
org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay.
Commit disk failure policy is stop_on_startup; terminating thread
(throwable0_message: Mutation checksum failure at 15167277 in
Currently we're using commitlog_batch:
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2
commitlog_segment_size_in_mb: 32
durable_writes is also true.
Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious
if much in this space has changed since then
What commitlog settings are you using?
Default is periodic with 10s sync. That leaves you a 10s window on hard
poweroff/crash.
I would also expect cassandra to cleanup and start cleanly, which version
are you running?
On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky
wrote:
> Hi Cassandra
I thought durable_writes is the solution.
-Arvinder
On Mon, Jul 26, 2021, 1:00 PM Leon Zaruvinsky
wrote:
> Hi Cassandra community,
>
> We (and others) regularly run into commit log corruptions that are caused
> by Cassandra, or the underlying infrastructure, being hard restarted. I
> suspect
10 matches
Mail list logo