Re: Cleanup blocking snapshots - Options?

2018-01-31 Thread kurt greaves
Thanks Thomas. I'll give it a shot myself and see if backporting the patch
fixes the problem. If it does I'll create a new ticket for backporting.

On 30 January 2018 at 09:22, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hi Kurt,
>
>
>
> had another try now, and yes, with 2.1.18, this constantly happens.
> Currently running nodetool cleanup on a single node in production with
> disabled hourly snapshots. SSTables with > 100G involved here. Triggering
> nodetool snapshot will result in being blocked. From an operational
> perspective, a bit annoying right now 
>
>
>
> Have asked on https://issues.apache.org/jira/browse/CASSANDRA-13873
> regarding a backport to 2.1, but possibly won’t get attention, cause the
> ticket has been resolved for 2.2+ already.
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Montag, 15. Jänner 2018 06:18
> *To:* User <user@cassandra.apache.org>
> *Subject:* Re: Cleanup blocking snapshots - Options?
>
>
>
> Disabling the snapshots is the best and only real option other than
> upgrading at the moment. Although apparently it was thought that there was
> only a small race condition in 2.1 that triggered this and it wasn't worth
> fixing. If you are triggering it easily maybe it is worth fixing in 2.1 as
> well. Does this happen consistently? Can you provide some more logs on the
> JIRA or better yet a way to reproduce?
>
>
>
> On 14 January 2018 at 16:12, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Hello,
>
>
>
> we are running 2.1.18 with vnodes in production and due to (
> https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run
> cleanup e.g. after extending the cluster without blocking our hourly
> snapshots.
>
>
>
> What options do we have to get rid of partitions a node does not own
> anymore?
>
> · Using a version which has this issue fixed, although upgrading
> to 2.2+, due to various issues, is not an option at the moment
>
> · Temporarily disabling the hourly cron job before starting
> cleanup and re-enable after cleanup has finished
>
> · Any other way to re-write SSTables with data a node owns after
> a cluster scale out
>
>
>
> Thanks,
>
> Thomas
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freist
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
> ädterstra
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
> ße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>


RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas
Hi Kurt,

had another try now, and yes, with 2.1.18, this constantly happens. Currently 
running nodetool cleanup on a single node in production with disabled hourly 
snapshots. SSTables with > 100G involved here. Triggering nodetool snapshot 
will result in being blocked. From an operational perspective, a bit annoying 
right now 

Have asked on https://issues.apache.org/jira/browse/CASSANDRA-13873 regarding a 
backport to 2.1, but possibly won’t get attention, cause the ticket has been 
resolved for 2.2+ already.

Regards,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Montag, 15. Jänner 2018 06:18
To: User <user@cassandra.apache.org>
Subject: Re: Cleanup blocking snapshots - Options?

Disabling the snapshots is the best and only real option other than upgrading 
at the moment. Although apparently it was thought that there was only a small 
race condition in 2.1 that triggered this and it wasn't worth fixing. If you 
are triggering it easily maybe it is worth fixing in 2.1 as well. Does this 
happen consistently? Can you provide some more logs on the JIRA or better yet a 
way to reproduce?

On 14 January 2018 at 16:12, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are running 2.1.18 with vnodes in production and due to 
(https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run cleanup 
e.g. after extending the cluster without blocking our hourly snapshots.

What options do we have to get rid of partitions a node does not own anymore?

• Using a version which has this issue fixed, although upgrading to 
2.2+, due to various issues, is not an option at the moment

• Temporarily disabling the hourly cron job before starting cleanup and 
re-enable after cleanup has finished

• Any other way to re-write SSTables with data a node owns after a 
cluster scale out

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Cleanup blocking snapshots - Options?

2018-01-15 Thread Nicolas Guyomar
Hi,

It might really be a long shot, but I thought UserDefinedCompaction
triggered by JMX on a single sstable might remove data the node does not
own  (to answer your " *Any other way to re-write SSTables with data a node
owns after a cluster scale out" *question part)

I might be wrong though

On 15 January 2018 at 08:43, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hi Kurt,
>
>
>
> it was easily triggered with the mentioned combination (cleanup after
> extending the cluster) a few months ago, thus I guess it will be the same
> when I re-try. Due to the issue we simply omitted running cleanup then, but
> as disk space is becoming some sort of bottle-neck again, we need to
> re-evaluate this situation J
>
>
>
> Regards,
>
> Thomas
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Montag, 15. Jänner 2018 06:18
> *To:* User <user@cassandra.apache.org>
> *Subject:* Re: Cleanup blocking snapshots - Options?
>
>
>
> Disabling the snapshots is the best and only real option other than
> upgrading at the moment. Although apparently it was thought that there was
> only a small race condition in 2.1 that triggered this and it wasn't worth
> fixing. If you are triggering it easily maybe it is worth fixing in 2.1 as
> well. Does this happen consistently? Can you provide some more logs on the
> JIRA or better yet a way to reproduce?
>
>
>
> On 14 January 2018 at 16:12, Steinmaurer, Thomas <
> thomas.steinmau...@dynatrace.com> wrote:
>
> Hello,
>
>
>
> we are running 2.1.18 with vnodes in production and due to (
> https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run
> cleanup e.g. after extending the cluster without blocking our hourly
> snapshots.
>
>
>
> What options do we have to get rid of partitions a node does not own
> anymore?
>
> · Using a version which has this issue fixed, although upgrading
> to 2.2+, due to various issues, is not an option at the moment
>
> · Temporarily disabling the hourly cron job before starting
> cleanup and re-enable after cleanup has finished
>
> · Any other way to re-write SSTables with data a node owns after
> a cluster scale out
>
>
>
> Thanks,
>
> Thomas
>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freist
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
> ädterstra
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
> ße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
> <https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>
>


RE: Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hi Kurt,

it was easily triggered with the mentioned combination (cleanup after extending 
the cluster) a few months ago, thus I guess it will be the same when I re-try. 
Due to the issue we simply omitted running cleanup then, but as disk space is 
becoming some sort of bottle-neck again, we need to re-evaluate this situation ☺

Regards,
Thomas

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Montag, 15. Jänner 2018 06:18
To: User <user@cassandra.apache.org>
Subject: Re: Cleanup blocking snapshots - Options?

Disabling the snapshots is the best and only real option other than upgrading 
at the moment. Although apparently it was thought that there was only a small 
race condition in 2.1 that triggered this and it wasn't worth fixing. If you 
are triggering it easily maybe it is worth fixing in 2.1 as well. Does this 
happen consistently? Can you provide some more logs on the JIRA or better yet a 
way to reproduce?

On 14 January 2018 at 16:12, Steinmaurer, Thomas 
<thomas.steinmau...@dynatrace.com<mailto:thomas.steinmau...@dynatrace.com>> 
wrote:
Hello,

we are running 2.1.18 with vnodes in production and due to 
(https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run cleanup 
e.g. after extending the cluster without blocking our hourly snapshots.

What options do we have to get rid of partitions a node does not own anymore?

• Using a version which has this issue fixed, although upgrading to 
2.2+, due to various issues, is not an option at the moment

• Temporarily disabling the hourly cron job before starting cleanup and 
re-enable after cleanup has finished

• Any other way to re-write SSTables with data a node owns after a 
cluster scale out

Thanks,
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ädterstra<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>ße
 
313<https://maps.google.com/?q=4040+Linz,+Austria,+Freist%C3%A4dterstra%C3%9Fe+313=gmail=g>

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


Re: Cleanup blocking snapshots - Options?

2018-01-14 Thread kurt greaves
Disabling the snapshots is the best and only real option other than
upgrading at the moment. Although apparently it was thought that there was
only a small race condition in 2.1 that triggered this and it wasn't worth
fixing. If you are triggering it easily maybe it is worth fixing in 2.1 as
well. Does this happen consistently? Can you provide some more logs on the
JIRA or better yet a way to reproduce?

On 14 January 2018 at 16:12, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> we are running 2.1.18 with vnodes in production and due to (
> https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run
> cleanup e.g. after extending the cluster without blocking our hourly
> snapshots.
>
>
>
> What options do we have to get rid of partitions a node does not own
> anymore?
>
> · Using a version which has this issue fixed, although upgrading
> to 2.2+, due to various issues, is not an option at the moment
>
> · Temporarily disabling the hourly cron job before starting
> cleanup and re-enable after cleanup has finished
>
> · Any other way to re-write SSTables with data a node owns after
> a cluster scale out
>
>
>
> Thanks,
>
> Thomas
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freist
> 
> ädterstra
> 
> ße 313
> 
>