Re: OOM after a while during compacting

2018-04-05 Thread Nate McCall
>
>
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
>

That heap size is way to small for G1GC. Switch back to the defaults with
CMS. IME, G1 needs > 20g for *just* the JVM to see improvements (but this
also depends on workload and a few other factors). Stick with the CMS
defaults unless you have some evidence-based experiment to try.

Also worth noting that with a 1TB gp2 EBS volume, you only have 3k IOPS to
play with before you are subject to rate limiting. If you allocate a volume
greater than 3.33TB, you get 10K IOPS and the rate limiting goes away (you
can see this playing around with the EBS sizing in the AWS calculator:
http://calculator.s3.amazonaws.com/index.html). Another common mistake here
is accidentally putting the commitlog on the boot volume which has a super
low amount of IOPS given it's 64g (?iirc) by default.


Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Yeah, they are pretty much unique but it's only a few requests per day so
hitting all the nodes would be fine for now.

2018-04-05 15:43 GMT+02:00 Evelyn Smith :

> Not sure if it differs for SASI Secondary Indexes but my understanding is
> it’s a bad idea to use high cardinality columns for Secondary Indexes.
> Not sure what your data model looks like but I’d assume UUID would have
> very high cardinality.
>
> If that’s the case it pretty much guarantees any query on the secondary
> index will hit all the nodes, which is what you want to avoid.
>
> Also Secondary Indexes are generally bad for Cassandra, if you don’t need
> them or there's a way around using them I’d go with that.
>
> Regards,
> Eevee.
>
>
> On 5 Apr 2018, at 11:27 pm, Zsolt Pálmai  wrote:
>
> Tried both (although with the biggest table) and the result is the same.
>
> I stumbled upon this jira issue: https://issues.apache.o
> rg/jira/browse/CASSANDRA-12662
> Since the sasi indexes I use are only helping in debugging (for now) I
> dropped them and it seems the tables get compacted now (at least it made it
> further then before and the jvm metrics look healthy).
>
> Still this is not ideal as it would be nice to have those secondary
> indexes :/ .
>
> The columns I indexed are basically uuids (so I can match the rows from
> other systems but this is usually triggered manually so performance loss is
> acceptable).
> Is there a recommended index to use here? Or setting
> the max_compaction_flush_memory_in_mb value? I saw that it can cause
> different kind of problems... Or the default secondary index?
>
> Thanks
>
>
>
> 2018-04-05 15:14 GMT+02:00 Evelyn Smith :
>
>> Probably a dumb question but it’s good to clarify.
>>
>> Are you compacting the whole keyspace or are you compacting tables one at
>> a time?
>>
>>
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
>>
>> Hi!
>>
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd
>> each) and when running the nodetool compact command on any of the servers I
>> get out of memory exception after a while.
>>
>> - Before calling the compact first I did a repair and before that there
>> was a bigger update on a lot of entries so I guess a lot of sstables were
>> created. The reapir created around ~250 pending compaction tasks, 2 of the
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the
>> heap (but running the compact on them manually also killed one :/ so this
>> isn't an ideal solution)
>>
>> Some more info:
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got
>> an OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2
>> sstables attached and compacts in seconds. The rest is mostly 1 line
>> partition with a few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0,
>> 0, 0, 0, 0]
>>
>> In the metrics it looks something like this before dying:
>> https://ibb.co/kLhdXH
>>
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH
>>
>> The load is usually pretty low, the nodes are almost idling (avg 500
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100
>> writes) and the pending tasks is also around 0 usually.
>>
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
>> cause problems? I could finish some bigger compactions where there was no
>> index attached but I'm not sure 100% if this is the cause.
>>
>> Thanks,
>> Zsolt
>>
>>
>>
>>
>>
>
>


Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Not sure if it differs for SASI Secondary Indexes but my understanding is it’s 
a bad idea to use high cardinality columns for Secondary Indexes. 
Not sure what your data model looks like but I’d assume UUID would have very 
high cardinality.

If that’s the case it pretty much guarantees any query on the secondary index 
will hit all the nodes, which is what you want to avoid.

Also Secondary Indexes are generally bad for Cassandra, if you don’t need them 
or there's a way around using them I’d go with that.

Regards,
Eevee.

> On 5 Apr 2018, at 11:27 pm, Zsolt Pálmai  wrote:
> 
> Tried both (although with the biggest table) and the result is the same. 
> 
> I stumbled upon this jira issue: 
> https://issues.apache.org/jira/browse/CASSANDRA-12662 
> 
> Since the sasi indexes I use are only helping in debugging (for now) I 
> dropped them and it seems the tables get compacted now (at least it made it 
> further then before and the jvm metrics look healthy). 
> 
> Still this is not ideal as it would be nice to have those secondary indexes 
> :/ . 
> 
> The columns I indexed are basically uuids (so I can match the rows from other 
> systems but this is usually triggered manually so performance loss is 
> acceptable). 
> Is there a recommended index to use here? Or setting the 
> max_compaction_flush_memory_in_mb value? I saw that it can cause different 
> kind of problems... Or the default secondary index?
> 
> Thanks
> 
> 
> 
> 2018-04-05 15:14 GMT+02:00 Evelyn Smith  >:
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > > wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH 
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> 
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Tried both (although with the biggest table) and the result is the same.

I stumbled upon this jira issue: https://issues.apache.
org/jira/browse/CASSANDRA-12662
Since the sasi indexes I use are only helping in debugging (for now) I
dropped them and it seems the tables get compacted now (at least it made it
further then before and the jvm metrics look healthy).

Still this is not ideal as it would be nice to have those secondary indexes
:/ .

The columns I indexed are basically uuids (so I can match the rows from
other systems but this is usually triggered manually so performance loss is
acceptable).
Is there a recommended index to use here? Or setting
the max_compaction_flush_memory_in_mb value? I saw that it can cause
different kind of problems... Or the default secondary index?

Thanks



2018-04-05 15:14 GMT+02:00 Evelyn Smith :

> Probably a dumb question but it’s good to clarify.
>
> Are you compacting the whole keyspace or are you compacting tables one at
> a time?
>
>
> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
>
> Hi!
>
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each)
> and when running the nodetool compact command on any of the servers I get
> out of memory exception after a while.
>
> - Before calling the compact first I did a repair and before that there
> was a bigger update on a lot of entries so I guess a lot of sstables were
> created. The reapir created around ~250 pending compaction tasks, 2 of the
> nodes I managed to finish with upgrading to a 2xlarge machine and twice the
> heap (but running the compact on them manually also killed one :/ so this
> isn't an ideal solution)
>
> Some more info:
> - Version is the newest 3.11.2 with java8u116
> - Using LeveledCompactionStrategy (we have mostly reads)
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
> - Storage load is 127Gb/140Gb/151Gb/155Gb
> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
> - The biggest partition I found was 90Mb but that table has only 2
> sstables attached and compacts in seconds. The rest is mostly 1 line
> partition with a few 10KB of data.
> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0,
> 0, 0, 0, 0]
>
> In the metrics it looks something like this before dying:
> https://ibb.co/kLhdXH
>
> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH
>
> The load is usually pretty low, the nodes are almost idling (avg 500
> reads/sec, 30-40 writes/sec with occasional few second spikes with >100
> writes) and the pending tasks is also around 0 usually.
>
> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
> cause problems? I could finish some bigger compactions where there was no
> index attached but I'm not sure 100% if this is the cause.
>
> Thanks,
> Zsolt
>
>
>
>
>


Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Oh and second, are you attempting a major compact while you have all those 
pending compactions?

Try letting the cluster catch up on compactions. Having that many pending is 
bad.

If you have replication factor of 3 and quorum you could go node by node and 
disable binary, raise concurrent compactors to 4 and unthrottle compactions by 
setting throughput to zero. This can help it catch up on those compactions. 
Then you can deal with trying a major compaction.

Regards,
Evelyn.

> On 5 Apr 2018, at 11:14 pm, Evelyn Smith  wrote:
> 
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > > wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH 
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> 
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Probably a dumb question but it’s good to clarify.

Are you compacting the whole keyspace or are you compacting tables one at a 
time?

> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
> 
> Hi!
> 
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
> and when running the nodetool compact command on any of the servers I get out 
> of memory exception after a while.
> 
> - Before calling the compact first I did a repair and before that there was a 
> bigger update on a lot of entries so I guess a lot of sstables were created. 
> The reapir created around ~250 pending compaction tasks, 2 of the nodes I 
> managed to finish with upgrading to a 2xlarge machine and twice the heap (but 
> running the compact on them manually also killed one :/ so this isn't an 
> ideal solution)
> 
> Some more info: 
> - Version is the newest 3.11.2 with java8u116
> - Using LeveledCompactionStrategy (we have mostly reads)
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got an 
> OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried 
> setting throughput between 16 and 128, no changes.
> - Storage load is 127Gb/140Gb/151Gb/155Gb
> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
> - The biggest partition I found was 90Mb but that table has only 2 sstables 
> attached and compacts in seconds. The rest is mostly 1 line partition with a 
> few 10KB of data.
> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
> 0, 0, 0]
> 
> In the metrics it looks something like this before dying: 
> https://ibb.co/kLhdXH 
> 
> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
> 
> 
> The load is usually pretty low, the nodes are almost idling (avg 500 
> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
> writes) and the pending tasks is also around 0 usually.
> 
> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
> cause problems? I could finish some bigger compactions where there was no 
> index attached but I'm not sure 100% if this is the cause.
> 
> Thanks,
> Zsolt
> 
> 
> 



OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Hi!

I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each)
and when running the nodetool compact command on any of the servers I get
out of memory exception after a while.

- Before calling the compact first I did a repair and before that there was
a bigger update on a lot of entries so I guess a lot of sstables were
created. The reapir created around ~250 pending compaction tasks, 2 of the
nodes I managed to finish with upgrading to a 2xlarge machine and twice the
heap (but running the compact on them manually also killed one :/ so this
isn't an ideal solution)

Some more info:
- Version is the newest 3.11.2 with java8u116
- Using LeveledCompactionStrategy (we have mostly reads)
- Heap size is set to 8GB
- Using G1GC
- I tried moving the memtable out of the heap. It helped but I still got an
OOM last night
- Concurrent compactors is set to 1 but it still happens and also tried
setting throughput between 16 and 128, no changes.
- Storage load is 127Gb/140Gb/151Gb/155Gb
- 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
- The biggest partition I found was 90Mb but that table has only 2 sstables
attached and compacts in seconds. The rest is mostly 1 line partition with
a few 10KB of data.
- Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0,
0, 0, 0]

In the metrics it looks something like this before dying:
https://ibb.co/kLhdXH

What the heap dump looks like of the top objects: https://ibb.co/ctkyXH

The load is usually pretty low, the nodes are almost idling (avg 500
reads/sec, 30-40 writes/sec with occasional few second spikes with >100
writes) and the pending tasks is also around 0 usually.

Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
cause problems? I could finish some bigger compactions where there was no
index attached but I'm not sure 100% if this is the cause.

Thanks,
Zsolt