Re: Cassandra commitlog corruption on hard shutdown

2022-04-04 Thread Leon Zaruvinsky
Hi all,

I wanted to echo back on this thread a bit of a "win".  In investigating
ways to mitigate the "corruption on hard shutdown" issue, we came across
the Group Commitlog feature that was added in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-13530).  We backported and
enabled this feature with "commitlog_sync_group_window_in_ms: 2" and the
results are:
- As expected, IOPS on the commitlog drive dropped drastically and no
longer scaled by number of writes.
- Write performance did not change significantly, and there was no impact
to our application (Cassandra write performance >2ms did not seem to be a
bottleneck)
- We've had *zero* commitlog corruption errors since we rolled this out to
our fleet 6 months ago!! Previously using batch commitlog, we faced 1-2
corruptions per month.

Cheers,
Leon


On Tue, Aug 3, 2021 at 11:39 PM Leon Zaruvinsky 
wrote:

> Following up, I've found that we tend to encounter one of three types of
> exceptions/commitlog corruptions:
>
> 1.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Mutation checksum failure at ... in CommitLog-5-1531150627243.log
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
>
> 2.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Could not read commit log descriptor in file CommitLog-5-1550003067433.log
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
>
> 3.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Encountered bad header at position ... of commit log
> CommitLog-5-1603991140803.log, with invalid CRC. The end of segment marker
> should be zero.
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
>
> I believe exception (2) is mitigated by
> https://issues.apache.org/jira/browse/CASSANDRA-11995 and
> https://issues.apache.org/jira/browse/CASSANDRA-13918
>
> But it's not clear to me how (1) and (3) can be mitigated.
>
> On Mon, Jul 26, 2021 at 6:40 PM Leon Zaruvinsky 
> wrote:
>
>> Thanks for the links/comments Jeff and Bowen.
>>
>> We run xfs. Not sure that we can switch to zfs, so a different solution
>> would be preferred.
>>
>> I’ll take a look through that patch – maybe I’ll try to backport and
>> replicate.  We’ve seen both cases where the commitlog is just 0s (empty)
>> and where it has had real data in it.
>>
>> Leon
>>
>> On Mon, Jul 26, 2021 at 6:38 PM Jeff Jirsa  wrote:
>>
>>> The commitlog code has changed DRASTICALLY between 2.x and trunk.
>>>
>>> If it's really a bunch of trailing 0s as was suggested later, then
>>> https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at
>>> least one cause/case of that particular bug.
>>>
>>>
>>>
>>> On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky <
>>> leonzaruvin...@gmail.com> wrote:
>>>
>>>> And for completeness, a sample stack trace:
>>>>
>>>> ERROR [2021-07-21T02:11:01.994Z] 
>>>> org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay. 
>>>> Commit disk failure policy is stop_on_startup; terminating thread 
>>>> (throwable0_message: Mutation checksum failure at 15167277 in 
>>>> CommitLog-5-1626828286977.log)
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>>>  Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
>>>>at 
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
>>>>    at 
>>>> org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
>>>>at 
>>>> org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
>>>>at 
>>>> org.apache.cassandra.service.CassandraDaemon

Re: Faster bulk keyspace creation

2022-03-09 Thread Leon Zaruvinsky
Hi Bowen,

Haha, agree with you on wanting fewer keyspaces but unfortunately we're
kind of locked in to our architecture for the time being.

We do part of what you're saying, in that we shut down all but one node and
then run CREATE against that single node.  But we do that serially,
O(keyspaces).  If we were to submit the CREATE statements in parallel, is
your claim that Cassandra would process these in parallel as well?

Thanks,
Leon

On Wed, Mar 9, 2022 at 12:46 PM Bowen Song  wrote:

> First of all, you really shouldn't have that many keyspaces. Put that
> aside, the quickest way to create large number of keyspaces without
> causing schema disagreement is create keyspaces in parallel over a
> connection pool with a number of connections all against the same single
> Cassandra node. Because all CREATE KEYPSPACE statements are sent to the
> same node, you don't need to worry about schema disagreement it may
> cause, as the server side internally will ensure the consistency of the
> schema.
>
> On 09/03/2022 18:35, Leon Zaruvinsky wrote:
> > Hey folks,
> >
> > A step in our Cassandra restore process is to re-create every keyspace
> > that existed in the backup in a brand new cluster.  Because these
> > creations are sequential, and because we have _a lot_ of keyspaces,
> > this ends up being the slowest part of our restore.  We already have
> > some optimizations in place to speed up schema agreement after each
> > create, but even so we'd like to get the time down significantly more.
> >
> > I was curious if anyone has any guidance or has experimented with ways
> > of creating keyspaces that are faster than a bunch of CREATE calls.
> > It's fine for the cluster to be offline during the process.
> >
> > Thanks,
> > Leon
>


Faster bulk keyspace creation

2022-03-09 Thread Leon Zaruvinsky
Hey folks,

A step in our Cassandra restore process is to re-create every keyspace that
existed in the backup in a brand new cluster.  Because these creations are
sequential, and because we have _a lot_ of keyspaces, this ends up being
the slowest part of our restore.  We already have some optimizations in
place to speed up schema agreement after each create, but even so we'd like
to get the time down significantly more.

I was curious if anyone has any guidance or has experimented with ways of
creating keyspaces that are faster than a bunch of CREATE calls.  It's fine
for the cluster to be offline during the process.

Thanks,
Leon


Re: Cassandra commitlog corruption on hard shutdown

2021-08-03 Thread Leon Zaruvinsky
Following up, I've found that we tend to encounter one of three types of
exceptions/commitlog corruptions:

1.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at ... in CommitLog-5-1531150627243.log
at
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)

2.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Could not read commit log descriptor in file CommitLog-5-1550003067433.log
at
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)

3.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Encountered bad header at position ... of commit log
CommitLog-5-1603991140803.log, with invalid CRC. The end of segment marker
should be zero.
at
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)

I believe exception (2) is mitigated by
https://issues.apache.org/jira/browse/CASSANDRA-11995 and
https://issues.apache.org/jira/browse/CASSANDRA-13918

But it's not clear to me how (1) and (3) can be mitigated.

On Mon, Jul 26, 2021 at 6:40 PM Leon Zaruvinsky 
wrote:

> Thanks for the links/comments Jeff and Bowen.
>
> We run xfs. Not sure that we can switch to zfs, so a different solution
> would be preferred.
>
> I’ll take a look through that patch – maybe I’ll try to backport and
> replicate.  We’ve seen both cases where the commitlog is just 0s (empty)
> and where it has had real data in it.
>
> Leon
>
> On Mon, Jul 26, 2021 at 6:38 PM Jeff Jirsa  wrote:
>
>> The commitlog code has changed DRASTICALLY between 2.x and trunk.
>>
>> If it's really a bunch of trailing 0s as was suggested later, then
>> https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least
>> one cause/case of that particular bug.
>>
>>
>>
>> On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky 
>> wrote:
>>
>>> And for completeness, a sample stack trace:
>>>
>>> ERROR [2021-07-21T02:11:01.994Z] 
>>> org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay. 
>>> Commit disk failure policy is stop_on_startup; terminating thread 
>>> (throwable0_message: Mutation checksum failure at 15167277 in 
>>> CommitLog-5-1626828286977.log)
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>>  Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
>>> at 
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222)
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>>> at 
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:741)
>>>
>>>
>>> On Mon, Jul 26, 2021 at 6:08 PM Leon Zaruvinsky <
>>> leonzaruvin...@gmail.com> wrote:
>>>
>>>> Currently we're using commitlog_batch:
>>>>
>>>> commitlog_sync: batch
>>>> commitlog_sync_batch_window_in_ms: 2
>>>> commitlog_segment_size_in_mb: 32
>>>>
>>>> durable_writes is also true.
>>>>
>>>> Unfortunately we are still using Cassandra 2.2.x :( Though I'd be
>>>> curious if much in this space has changed since then (I've looked through
>>>> the changelogs and nothing stood out).
>>>>
>>>> On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa  wrote:
>>>>
>>>>> What commitlog settings are you using?
>>>>>
>>>>> Default is periodic with 10s sync. That leaves you a 10s window on
>>>>> hard poweroff/crash.
>>>>>
>>

Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Thanks for the links/comments Jeff and Bowen.

We run xfs. Not sure that we can switch to zfs, so a different solution
would be preferred.

I’ll take a look through that patch – maybe I’ll try to backport and
replicate.  We’ve seen both cases where the commitlog is just 0s (empty)
and where it has had real data in it.

Leon

On Mon, Jul 26, 2021 at 6:38 PM Jeff Jirsa  wrote:

> The commitlog code has changed DRASTICALLY between 2.x and trunk.
>
> If it's really a bunch of trailing 0s as was suggested later, then
> https://issues.apache.org/jira/browse/CASSANDRA-11995 addresses at least
> one cause/case of that particular bug.
>
>
>
> On Mon, Jul 26, 2021 at 3:11 PM Leon Zaruvinsky 
> wrote:
>
>> And for completeness, a sample stack trace:
>>
>> ERROR [2021-07-21T02:11:01.994Z] 
>> org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay. 
>> Commit disk failure policy is stop_on_startup; terminating thread 
>> (throwable0_message: Mutation checksum failure at 15167277 in 
>> CommitLog-5-1626828286977.log)
>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>  Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
>>  at 
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
>>  at 
>> org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
>>  at 
>> org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
>>  at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222)
>>  at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
>>  at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:741)
>>
>>
>> On Mon, Jul 26, 2021 at 6:08 PM Leon Zaruvinsky 
>> wrote:
>>
>>> Currently we're using commitlog_batch:
>>>
>>> commitlog_sync: batch
>>> commitlog_sync_batch_window_in_ms: 2
>>> commitlog_segment_size_in_mb: 32
>>>
>>> durable_writes is also true.
>>>
>>> Unfortunately we are still using Cassandra 2.2.x :( Though I'd be
>>> curious if much in this space has changed since then (I've looked through
>>> the changelogs and nothing stood out).
>>>
>>> On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa  wrote:
>>>
>>>> What commitlog settings are you using?
>>>>
>>>> Default is periodic with 10s sync. That leaves you a 10s window on hard
>>>> poweroff/crash.
>>>>
>>>> I would also expect cassandra to cleanup and start cleanly, which
>>>> version are you running?
>>>>
>>>>
>>>>
>>>> On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky <
>>>> leonzaruvin...@gmail.com> wrote:
>>>>
>>>>> Hi Cassandra community,
>>>>>
>>>>> We (and others) regularly run into commit log corruptions that are
>>>>> caused by Cassandra, or the underlying infrastructure, being hard
>>>>> restarted.  I suspect that this is because it happens in the middle of a
>>>>> commitlog file write to disk.
>>>>>
>>>>> Could anyone point me at resources / code to understand why this is
>>>>> happening?  Shouldn't Cassandra not be acking writes until the commitlog 
>>>>> is
>>>>> safely written to disk?  I would expect that on startup, Cassandra should
>>>>> be able to clean up bad commitlog files and recover gracefully.
>>>>>
>>>>> I've seen various references online to this issue as something that
>>>>> will be fixed in the future - so I'm curious if there is any movement or
>>>>> thoughts there.
>>>>>
>>>>> Thanks a bunch,
>>>>> Leon
>>>>>
>>>>


Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
And for completeness, a sample stack trace:

ERROR [2021-07-21T02:11:01.994Z]
org.apache.cassandra.db.commitlog.CommitLog: Failed commit log replay.
Commit disk failure policy is stop_on_startup; terminating thread
(throwable0_message: Mutation checksum failure at 15167277 in
CommitLog-5-1626828286977.log)
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
Mutation checksum failure at 15167277 in CommitLog-5-1626828286977.log
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:647)
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:519)
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:401)
at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:175)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:155)
at 
org.apache.cassandra.service.CassandraDaemon.recoverCommitlogAndCompleteSetup(CassandraDaemon.java:296)
at 
org.apache.cassandra.service.CassandraDaemon.completeSetupMayThrowSstableException(CassandraDaemon.java:289)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:630)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:741)


On Mon, Jul 26, 2021 at 6:08 PM Leon Zaruvinsky 
wrote:

> Currently we're using commitlog_batch:
>
> commitlog_sync: batch
> commitlog_sync_batch_window_in_ms: 2
> commitlog_segment_size_in_mb: 32
>
> durable_writes is also true.
>
> Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious
> if much in this space has changed since then (I've looked through the
> changelogs and nothing stood out).
>
> On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa  wrote:
>
>> What commitlog settings are you using?
>>
>> Default is periodic with 10s sync. That leaves you a 10s window on hard
>> poweroff/crash.
>>
>> I would also expect cassandra to cleanup and start cleanly, which version
>> are you running?
>>
>>
>>
>> On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky 
>> wrote:
>>
>>> Hi Cassandra community,
>>>
>>> We (and others) regularly run into commit log corruptions that are
>>> caused by Cassandra, or the underlying infrastructure, being hard
>>> restarted.  I suspect that this is because it happens in the middle of a
>>> commitlog file write to disk.
>>>
>>> Could anyone point me at resources / code to understand why this is
>>> happening?  Shouldn't Cassandra not be acking writes until the commitlog is
>>> safely written to disk?  I would expect that on startup, Cassandra should
>>> be able to clean up bad commitlog files and recover gracefully.
>>>
>>> I've seen various references online to this issue as something that will
>>> be fixed in the future - so I'm curious if there is any movement or
>>> thoughts there.
>>>
>>> Thanks a bunch,
>>> Leon
>>>
>>


Re: Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Currently we're using commitlog_batch:

commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2
commitlog_segment_size_in_mb: 32

durable_writes is also true.

Unfortunately we are still using Cassandra 2.2.x :( Though I'd be curious
if much in this space has changed since then (I've looked through the
changelogs and nothing stood out).

On Mon, Jul 26, 2021 at 5:20 PM Jeff Jirsa  wrote:

> What commitlog settings are you using?
>
> Default is periodic with 10s sync. That leaves you a 10s window on hard
> poweroff/crash.
>
> I would also expect cassandra to cleanup and start cleanly, which version
> are you running?
>
>
>
> On Mon, Jul 26, 2021 at 1:00 PM Leon Zaruvinsky 
> wrote:
>
>> Hi Cassandra community,
>>
>> We (and others) regularly run into commit log corruptions that are caused
>> by Cassandra, or the underlying infrastructure, being hard restarted.  I
>> suspect that this is because it happens in the middle of a commitlog file
>> write to disk.
>>
>> Could anyone point me at resources / code to understand why this is
>> happening?  Shouldn't Cassandra not be acking writes until the commitlog is
>> safely written to disk?  I would expect that on startup, Cassandra should
>> be able to clean up bad commitlog files and recover gracefully.
>>
>> I've seen various references online to this issue as something that will
>> be fixed in the future - so I'm curious if there is any movement or
>> thoughts there.
>>
>> Thanks a bunch,
>> Leon
>>
>


Cassandra commitlog corruption on hard shutdown

2021-07-26 Thread Leon Zaruvinsky
Hi Cassandra community,

We (and others) regularly run into commit log corruptions that are caused
by Cassandra, or the underlying infrastructure, being hard restarted.  I
suspect that this is because it happens in the middle of a commitlog file
write to disk.

Could anyone point me at resources / code to understand why this is
happening?  Shouldn't Cassandra not be acking writes until the commitlog is
safely written to disk?  I would expect that on startup, Cassandra should
be able to clean up bad commitlog files and recover gracefully.

I've seen various references online to this issue as something that will be
fixed in the future - so I'm curious if there is any movement or thoughts
there.

Thanks a bunch,
Leon


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
> Our JVM options are unchanged between 2.2 and 3.11
>>
>
> For the sake of clarity, do you mean:
> (a) you're using the default JVM options in 3.11 and it's different to the
> options you had in 2.2?
> (b) you've copied the same JVM options you had in 2.2 to 3.11?
>

(b), which are the default options from 2.2 (and I believe the default
options in 3.11 from a brief glance).

Copied here for clarity, though I'm skeptical that GC settings are actually
a cause here because I would expect them to only impact the upgraded node
and not the cluster overall.

### CMS Settings
-XX:+UseParNewGC
XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
XX:+CMSClassUnloadingEnabled


> The distinction is important because at the moment, you need to go through
> a process of elimination to identify the cause.
>
>
>> Read throughput (rate, bytes read/range scanned, etc.) seems fairly
>> consistent before and after the upgrade across all nodes.
>>
>
> What I was trying to get at is whether the upgraded node was getting hit
> with more traffic compared to the other nodes since it will indicate that
> the longer GCs are just the symptom, not the cause.
>
>
I don't see any distinct change, nor do I see an increase in traffic to the
upgraded node that would result in longer GC pauses.  Frankly I don't see
any changes or aberrations in client-related metrics at all that correlate
to the GC pauses, except for the corresponding timeouts.


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
Thanks Erick.



Our JVM options are unchanged between 2.2 and 3.11, and we have disk access
mode set to standard.  Generally we’ve maintained all configuration between
the two versions.


Read throughput (rate, bytes read/range scanned, etc.) seems fairly
consistent before and after the upgrade across all nodes.


Leon

On Wed, Oct 28, 2020 at 12:01 AM Erick Ramirez 
wrote:

> I haven't seen this specific behaviour in the past but things that I would
> look at are:
>
>- JVM options which differ between 3.11 defaults and what you have
>configured in 2.2
>- review your monitoring and check read throughput on the upgraded
>node as compared to 2.2 nodes
>- possibly not have disk access mode set to map index files only (not
>directly related to long GC pauses)
>
> If you're interested, I've written a post about disk access mode here --
> https://community.datastax.com/questions/6947/. Cheers!
>
>>


GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-27 Thread Leon Zaruvinsky
Hi,

I'm attempting an upgrade of Cassandra 2.2.18 to 3.11.6, but had to abort
because of major performance issues associated with GC pauses.

Details:
3 node cluster, RF 3, 1 DC
~2TB data per node
Heap Size: 12G / New Size: 5G

I didn't even get very far in the upgrade - I just upgraded a binary of a
single node to 3.11.6 (did not run upgradesstables) and let it sit.  Within
10 minutes, I started seeing elevated GC pressure and lots of timeouts in
the metrics.

All three nodes, not just the upgraded one, are seeing GC problems.
GC par new time jumped from .38 up to 3%.  CMS times up to 30 seconds.

Once I turn off node on 3.11.6, the cluster eventually recovers.

Can anyone point me to ways to debug this?  I've taken heap dumps of all
nodes but nothing in particular stands out, and there are no
obvious messages in the logs that point to problems.


Re: Difference in num_tokens between Cassandra 2 and 3?

2020-08-06 Thread Leon Zaruvinsky
Thanks Erick, that confirms my suspicion.

Cheers!

On Thu, Aug 6, 2020 at 8:55 PM Erick Ramirez 
wrote:

> C* 3.0 added a new algorithm that optimised the token allocation
> (CASSANDRA-7032) [1] with allocate_tokens_for_keyspace in cassandra.yaml
> (originally allocate_tokens_keyspace but renamed) [2].
>
> Apart from that, there's no real change to how num_tokens work. What
> really changed is the philosophy on 256 being a bad default operationally.
> The new proposed default is 16 (CASSANDRA-13701). [3]
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-7032
> [2]
> https://github.com/apache/cassandra/commit/36d0f55d46ac0edb5a4f140c7993c6d207605fe7
> [3] https://issues.apache.org/jira/browse/CASSANDRA-13701
>


Difference in num_tokens between Cassandra 2 and 3?

2020-08-06 Thread Leon Zaruvinsky
Hi,

I'm currently investigating an upgrade for our Cassandra cluster from 2.2
to 3.11, and as part of that would like to understand if there is any
change in how the cluster behaves w.r.t number of tokens.  For historical
reasons, we have num_tokens set very high but want to make sure that this
is not more dangerous in a later version.

I've read recent threads on the new default, and the Netflix whitepaper, so
I'm fairly comfortable with the pros/cons of various token counts - but
specifically am interested about the difference in behavior between
Cassandra major versions, if one exists.

Thanks,
Leon


Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Leon Zaruvinsky
Yep, Jeff is right, the intention would be to run a repair limited to the
available nodes.

On Wed, May 27, 2020 at 2:59 PM Jeff Jirsa  wrote:

> The "-hosts " flag tells cassandra to only compare trees/run repair on the
> hosts you specify, so if you have 3 replicas, but 1 replica is down, you
> can provide -hosts with the other two, and it will make sure those two are
> in sync (via merkle trees, etc), but ignore the third.
>
>
>
> On Wed, May 27, 2020 at 10:45 AM Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> If Cassandra is down how will it generate merkle tree to compare?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On May 27, 2020, at 11:15 AM, Jeff Jirsa  wrote:
>>
>> 
>> You definitely can repair with a node down by passing `-hosts
>> specific_hosts`
>>
>> On Wed, May 27, 2020 at 9:06 AM Nitan Kainth 
>> wrote:
>>
>>> I didn't get you Leon,
>>>
>>> But, the simple thing is just to follow the steps and you will be fine.
>>> You can't run the repair if the node is down.
>>>
>>> On Wed, May 27, 2020 at 10:34 AM Leon Zaruvinsky <
>>> leonzaruvin...@gmail.com> wrote:
>>>
>>>> Hey Jeff/Nitan,
>>>>
>>>> 1) this concern should not be a problem if the repair happens before
>>>> the corrupted node is brought back online, right?
>>>> 2) in this case, is option (3) equivalent to replacing the node? where
>>>> we repair the two live nodes and then bring up the third node with no data
>>>>
>>>> Leon
>>>>
>>>> On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:
>>>>
>>>>> There’s two problems with this approach if you need strict correctness
>>>>>
>>>>> 1) after you delete the sstable and before you repair you’ll violate
>>>>> consistency, so you’ll potentially serve incorrect data for a while
>>>>>
>>>>> 2) The sstable May have a tombstone past gc grace that’s shadowing
>>>>> data in another sstable that’s not corrupt and deleting it may resurrect
>>>>> that deleted data.
>>>>>
>>>>> The only strictly safe thing to do here, unfortunately, is to treat
>>>>> the host as failed and rebuild it from it’s neighbors (and again being
>>>>> pedantic here, that means stop the host, while it’s stopped repair the
>>>>> surviving replicas, then bootstrap a replacement on top of the same 
>>>>> tokens)
>>>>>
>>>>>
>>>>>
>>>>> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky <
>>>>> leonzaruvin...@gmail.com> wrote:
>>>>> >
>>>>> > 
>>>>> > Hi all,
>>>>> >
>>>>> > I'm looking to understand Cassandra's behavior in an sstable
>>>>> corruption scenario, and what the minimum amount of work is that needs to
>>>>> be done to remove a bad sstable file.
>>>>> >
>>>>> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
>>>>> > SStable corruption exception on one node at
>>>>> keyspace1/table1/lb-1-big-Data.db
>>>>> > Sstablescrub does not work.
>>>>> >
>>>>> > Is it safest to, after running a repair on the two live nodes,
>>>>> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
>>>>> > 2) Delete all files associated with that sstable (i.e.,
>>>>> keyspace1/table1/lb-1-*),
>>>>> > 3) Delete all files under keyspace1/table1/, or
>>>>> > 4) Any of the above are the same from a correctness perspective.
>>>>> >
>>>>> > Thanks,
>>>>> > Leon
>>>>> >
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>
>>>>>


Re: Is deleting live sstable safe in this scenario?

2020-05-27 Thread Leon Zaruvinsky
Hey Jeff/Nitan,

1) this concern should not be a problem if the repair happens before the
corrupted node is brought back online, right?
2) in this case, is option (3) equivalent to replacing the node? where we
repair the two live nodes and then bring up the third node with no data

Leon

On Tue, May 26, 2020 at 10:11 PM Jeff Jirsa  wrote:

> There’s two problems with this approach if you need strict correctness
>
> 1) after you delete the sstable and before you repair you’ll violate
> consistency, so you’ll potentially serve incorrect data for a while
>
> 2) The sstable May have a tombstone past gc grace that’s shadowing data in
> another sstable that’s not corrupt and deleting it may resurrect that
> deleted data.
>
> The only strictly safe thing to do here, unfortunately, is to treat the
> host as failed and rebuild it from it’s neighbors (and again being pedantic
> here, that means stop the host, while it’s stopped repair the surviving
> replicas, then bootstrap a replacement on top of the same tokens)
>
>
>
> > On May 26, 2020, at 4:46 PM, Leon Zaruvinsky 
> wrote:
> >
> > 
> > Hi all,
> >
> > I'm looking to understand Cassandra's behavior in an sstable corruption
> scenario, and what the minimum amount of work is that needs to be done to
> remove a bad sstable file.
> >
> > Consider: 3 node, RF 3 cluster, reads/writes at quorum
> > SStable corruption exception on one node at
> keyspace1/table1/lb-1-big-Data.db
> > Sstablescrub does not work.
> >
> > Is it safest to, after running a repair on the two live nodes,
> > 1) Delete only keyspace1/table1/lb-1-big-Data.db,
> > 2) Delete all files associated with that sstable (i.e.,
> keyspace1/table1/lb-1-*),
> > 3) Delete all files under keyspace1/table1/, or
> > 4) Any of the above are the same from a correctness perspective.
> >
> > Thanks,
> > Leon
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Is deleting live sstable safe in this scenario?

2020-05-26 Thread Leon Zaruvinsky
Hi all,

I'm looking to understand Cassandra's behavior in an sstable corruption
scenario, and what the minimum amount of work is that needs to be done to
remove a bad sstable file.

Consider: 3 node, RF 3 cluster, reads/writes at quorum
SStable corruption exception on one node at
keyspace1/table1/lb-1-big-Data.db
Sstablescrub does not work.

Is it safest to, after running a repair on the two live nodes,
1) Delete only keyspace1/table1/lb-1-big-Data.db,
2) Delete all files associated with that sstable (i.e.,
keyspace1/table1/lb-1-*),
3) Delete all files under keyspace1/table1/, or
4) Any of the above are the same from a correctness perspective.

Thanks,
Leon


Re: Cassandra build failing after Central Repository HTTPS

2020-01-15 Thread Leon Zaruvinsky
I think I've found the source of the issue.  We are using assert-core on
our fork of the repository, which is attempting to pull in junit-5.4.0.
junit-5.4.0 doesn't seem to be published on either of the https
repositories, so ant tries to hit the http as a backup and then fails with
the error below.



It's curious that Junit was published to the http but not https repository.



Either way, thanks for the assistance in debugging!

On Wed, Jan 15, 2020 at 8:19 PM Leon Zaruvinsky
 wrote:

> I agree that something feels wonky on the Circle container.  I was able to
> successfully build locally.
>
> I SSHed into the container, cleared out .m2/repository and still can't get
> it to build.  Using ant 1.10.7 with environment:
>
> openjdk version "1.8.0_222"
> OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1~deb9u1-b10)
> OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
>
> On 1/15/20, 7:14 PM, "Michael Shuler"  wrote:
>
> I just did a quick `rm -r ~/.m2/repository/` so the build would
> download
> everything, and with 2.2.14 tag + 63ff65a8dd, the build succeeded for
> me
> fine, pulling everything from https. Not sure where to go, unless the
> circleci container is somehow contaminated with http cached data.
>
> Michael
>
> On 1/15/20 6:05 PM, Michael Shuler wrote:
> > Bleh, sorry, you updated those, right.
> >
> > I don't see any other related commits to build.xml nor
> > build.properties.default.. Some elderly sub-dependent that has an
> http
> > URL in it, perhaps?
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
>


Re: Cassandra build failing after Central Repository HTTPS

2020-01-15 Thread Leon Zaruvinsky
I agree that something feels wonky on the Circle container.  I was able to 
successfully build locally.

I SSHed into the container, cleared out .m2/repository and still can't get it 
to build.  Using ant 1.10.7 with environment:

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1~deb9u1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

On 1/15/20, 7:14 PM, "Michael Shuler"  wrote:

I just did a quick `rm -r ~/.m2/repository/` so the build would download 
everything, and with 2.2.14 tag + 63ff65a8dd, the build succeeded for me 
fine, pulling everything from https. Not sure where to go, unless the 
circleci container is somehow contaminated with http cached data.

Michael

On 1/15/20 6:05 PM, Michael Shuler wrote:
> Bleh, sorry, you updated those, right.
> 
> I don't see any other related commits to build.xml nor 
> build.properties.default.. Some elderly sub-dependent that has an http 
> URL in it, perhaps?
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org





Re: Cassandra build failing after Central Repository HTTPS

2020-01-15 Thread Leon Zaruvinsky
Hey Michael,

That’s basically what I did, but still getting the error unfortunately.  The 
first artifact does seem to recognize to use https, but the transitive 
dependency does not.

Thanks

> On Jan 15, 2020, at 7:00 PM, Michael Shuler  wrote:
> 
> You could cherry-pick sha 63ff65a8dd3a31e500ae5ec6232f1f9eade6fa3d which was 
> committed after the 2.2.14 release tag.
> 
> https://github.com/apache/cassandra/commit/63ff65a8dd3a31e500ae5ec6232f1f9eade6fa3d
> 
> -- 
> Kind regards,
> Michael
> 
>> On 1/15/20 5:44 PM, Leon Zaruvinsky wrote:
>> Hey all,
>> I'm having trouble with the building Cassandra 2.2.14 on CircleCI since 
>> Central Repo has stopped supporting http.*
>> * https://central.sonatype.org/articles/2020/Jan/15/501-https-required-error/
>> I've updated the build.properties and build.xml files to use https.  
>> However, it seems that the ant build starts to use https and then switches 
>> to http after a few artifacts:
>> maven-ant-tasks-retrieve-build: [artifact:dependencies] Downloading: 
>> junit/junit/4.6/junit-4.6.pom from repository central at 
>> https://repo1.maven.org/maven2 [artifact:dependencies] Transferring 1K from 
>> central [artifact:dependencies] Downloading: 
>> org/assertj/assertj-core/3.12.0/assertj-core-3.12.0.pom from repository 
>> central at https://repo1.maven.org/maven2 [artifact:dependencies] 
>> Transferring 13K from central [artifact:dependencies] Downloading: 
>> org/assertj/assertj-parent-pom/2.2.2/assertj-parent-pom-2.2.2.pom from 
>> repository central at https://repo1.maven.org/maven2 [artifact:dependencies] 
>> Transferring 16K from central [artifact:dependencies] Downloading: 
>> org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom from repository central at 
>> http://repo1.maven.org/maven2 [artifact:dependencies] Error transferring 
>> file: Server returned HTTP response code: 501 for URL: 
>> http://repo1.maven.org/maven2/org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom 
>> [artifact:dependencies] [WARNING] Unable to get resource 
>> 'org.junit:junit-bom:pom:5.4.0' from repository central 
>> (http://repo1.maven.org/maven2): Error transferring file: Server returned 
>> HTTP response code: 501 for URL: 
>> http://repo1.maven.org/maven2/org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom 
>> [artifact:dependencies] An error has occurred while processing the Maven 
>> artifact tasks. [artifact:dependencies] Diagnosis: [artifact:dependencies] 
>> [artifact:dependencies] Unable to resolve artifact: Unable to get dependency 
>> information: Unable to read the metadata file for artifact 
>> 'org.assertj:assertj-core:jar': POM 'org.junit:junit-bom' not found in 
>> repository: Unable to download the artifact from any repository 
>> [artifact:dependencies] [artifact:dependencies] 
>> org.junit:junit-bom:pom:5.4.0 [artifact:dependencies] 
>> [artifact:dependencies] from the specified remote repositories: 
>> [artifact:dependencies] central (http://repo1.maven.org/maven2) 
>> [artifact:dependencies] [artifact:dependencies] for project 
>> org.junit:junit-bom [artifact:dependencies] 
>> org.assertj:assertj-core:jar:3.12.0 [artifact:dependencies] 
>> [artifact:dependencies] from the specified remote repositories: 
>> [artifact:dependencies] apache (https://repo.maven.apache.org/maven2), 
>> [artifact:dependencies] central (https://repo1.maven.org/maven2) 
>> [artifact:dependencies] [artifact:dependencies] Path to dependency: 
>> [artifact:dependencies] 1) 
>> org.apache.cassandra:cassandra-build-deps:jar:2.2.14-SNAPSHOT 
>> [artifact:dependencies] [artifact:dependencies]
>> There are no references to http://repo1.maven.org/maven2 anywhere in my 
>> repo.   One theory is that this reference is being automatically injected 
>> somewhere, but I'm not sure where or how to stop it.
>> Has anyone else encountered this or has suggestions for how to fix?
>> Thanks,
>> Leon
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cassandra build failing after Central Repository HTTPS

2020-01-15 Thread Leon Zaruvinsky
Hey all,

I'm having trouble with the building Cassandra 2.2.14 on CircleCI since
Central Repo has stopped supporting http.*

*
https://central.sonatype.org/articles/2020/Jan/15/501-https-required-error/

I've updated the build.properties and build.xml files to use https.
However, it seems that the ant build starts to use https and then switches
to http after a few artifacts:



maven-ant-tasks-retrieve-build:
[artifact:dependencies] Downloading: junit/junit/4.6/junit-4.6.pom
from repository central at https://repo1.maven.org/maven2
[artifact:dependencies] Transferring 1K from central
[artifact:dependencies] Downloading:
org/assertj/assertj-core/3.12.0/assertj-core-3.12.0.pom from
repository central at https://repo1.maven.org/maven2
[artifact:dependencies] Transferring 13K from central
[artifact:dependencies] Downloading:
org/assertj/assertj-parent-pom/2.2.2/assertj-parent-pom-2.2.2.pom from
repository central at https://repo1.maven.org/maven2
[artifact:dependencies] Transferring 16K from central
[artifact:dependencies] Downloading:
org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom from repository central
at http://repo1.maven.org/maven2
[artifact:dependencies] Error transferring file: Server returned HTTP
response code: 501 for URL:
http://repo1.maven.org/maven2/org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom
[artifact:dependencies] [WARNING] Unable to get resource
'org.junit:junit-bom:pom:5.4.0' from repository central
(http://repo1.maven.org/maven2): Error transferring file: Server
returned HTTP response code: 501 for URL:
http://repo1.maven.org/maven2/org/junit/junit-bom/5.4.0/junit-bom-5.4.0.pom
[artifact:dependencies] An error has occurred while processing the
Maven artifact tasks.
[artifact:dependencies]  Diagnosis:
[artifact:dependencies]
[artifact:dependencies] Unable to resolve artifact: Unable to get
dependency information: Unable to read the metadata file for artifact
'org.assertj:assertj-core:jar': POM 'org.junit:junit-bom' not found in
repository: Unable to download the artifact from any repository
[artifact:dependencies]
[artifact:dependencies]   org.junit:junit-bom:pom:5.4.0
[artifact:dependencies]
[artifact:dependencies] from the specified remote repositories:
[artifact:dependencies]   central (http://repo1.maven.org/maven2)
[artifact:dependencies]
[artifact:dependencies]  for project org.junit:junit-bom
[artifact:dependencies]   org.assertj:assertj-core:jar:3.12.0
[artifact:dependencies]
[artifact:dependencies] from the specified remote repositories:
[artifact:dependencies]   apache (https://repo.maven.apache.org/maven2),
[artifact:dependencies]   central (https://repo1.maven.org/maven2)
[artifact:dependencies]
[artifact:dependencies] Path to dependency:
[artifact:dependencies] 1)
org.apache.cassandra:cassandra-build-deps:jar:2.2.14-SNAPSHOT
[artifact:dependencies]
[artifact:dependencies]



There are no references to http://repo1.maven.org/maven2 anywhere in my
repo.   One theory is that this reference is being automatically injected
somewhere, but I'm not sure where or how to stop it.

Has anyone else encountered this or has suggestions for how to fix?

Thanks,
Leon


Breaking up major compacted Sstable with TWCS

2019-07-11 Thread Leon Zaruvinsky
Hi,

We are switching a table to run using TWCS. However, after running the alter 
statement, we ran a major compaction without understanding the implications.

Now, while new sstables are properly being created according to the time 
window, there is a giant sstable sitting around waiting for expiration.

Is there a way we can break it up again?  Running the alter statement again 
doesn’t seem to be touching it.

Thanks,
Leon

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org