Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Hi Cassandra community, We (and others) regularly run into commit log corruptions that are caused by Cassandra, or the underlying infrastructure, being hard restarted. I suspect that this is because it happens in the middle of a commitlog file write to disk. Could anyone point me at resources /

Re: [RELEASE] Apache Cassandra 4.0.0 released

2021-07-26 Thread Joe Obernberger
Whoo hoo!  Looking forward to trying it out! -Joe On 7/26/2021 4:03 PM, Brandon Williams wrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 4.0.0. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high

[RELEASE] Apache Cassandra 4.0.0 released

2021-07-26 Thread Brandon Williams
The Cassandra team is pleased to announce the release of Apache Cassandra version 4.0.0. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Arvinder Dhillon
I thought durable_writes is the solution. -Arvinder On Mon, Jul 26, 2021, 1:00 PM Leon Zaruvinsky wrote: > Hi Cassandra community, > > We (and others) regularly run into commit log corruptions that are caused > by Cassandra, or the underlying infrastructure, being hard restarted. I > suspect

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Jeff Jirsa
What commitlog settings are you using? Default is periodic with 10s sync. That leaves you a 10s window on hard poweroff/crash. I would also expect cassandra to cleanup and start cleanly, which version are you running? On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky wrote: > Hi Cassandra

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Currently we're using commitlog_batch: commitlog_sync: batch commitlog_sync_batch_window_in_ms: 2 commitlog_segment_size_in_mb: 32 durable_writes is also true. Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious if much in this space has changed since then

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
And for completeness, a sample stack trace: ERROR [2021-07-21T02:11:01.994Z] org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay. Commit disk failure policy is stop_on_startup; terminating thread (throwable0_message: Mutation checksum failure at 15167277 in

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Bowen Song
I have seen the same error in Cassandra 3.x too, and in fact quite a few times. On a few occasions, I opened the corrupted commit log file in a hex editor, and it was filled with a lots of 0x00s. I believe it was caused by the combination of the way Cassandra flushes the commit log + the way

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Jeff Jirsa
The commitlog code has changed DRASTICALLY between 2.x and trunk. If it's really a bunch of trailing 0s as was suggested later, then https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least one cause/case of that particular bug. On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Thanks for the links/comments Jeff and Bowen. We run xfs. Not sure that we can switch to zfs, so a different solution would be preferred. I’ll take a look through that patch – maybe I’ll try to backport and replicate. We’ve seen both cases where the commitlog is just 0s (empty) and where it has