Re: OOM after a while during compacting

2018-04-05 Thread Nate McCall
>
>
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
>

That heap size is way to small for G1GC. Switch back to the defaults with
CMS. IME, G1 needs > 20g for *just* the JVM to see improvements (but this
also depends on workload and a few other factors). Stick with the CMS
defaults unless you have some evidence-based experiment to try.

Also worth noting that with a 1TB gp2 EBS volume, you only have 3k IOPS to
play with before you are subject to rate limiting. If you allocate a volume
greater than 3.33TB, you get 10K IOPS and the rate limiting goes away (you
can see this playing around with the EBS sizing in the AWS calculator:
http://calculator.s3.amazonaws.com/index.html). Another common mistake here
is accidentally putting the commitlog on the boot volume which has a super
low amount of IOPS given it's 64g (?iirc) by default.


Upgrade to 3.11.2 disabled JMX

2018-04-05 Thread Lucas Benevides
Dear community members,

I have just upgraded my Cassandra from version 3.11.1 to 3.11.2. I kept my
previous configuration files: cassandra.yaml and cassandra-env.sh. However,
when I started the cassandra service, I couldn't connect via JMX (tried to
to it with a java program, with JConsole and a prometheus client).

When I run netstat -na it does not show port 7199 open.
Tried to look at the logs but didn't see anything.

Can you figure out why it happened and point any possible solution? Config
files enable JMX with authtenticaion=false, but it doesn't work.

Thanks in advance,
Lucas Benevides


Re: Shifting data to DCOS

2018-04-05 Thread Michael Shuler
On 04/05/2018 09:04 AM, Faraz Mateen wrote:
> 
> For example,  if the table is *data_main_bim_dn_10*, its data directory
> is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created
> a new table with the same name through cqlsh. This resulted in creation
> of another directory with a different hash i.e.
> data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
> from the former to the latter. 
> 
> Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
> was able to access all data contents through cqlsh.
> 
> Now, the problem is, I have around 500 tables and the method I mentioned
> above is quite cumbersome. Bulkloading through sstableloader or remote
> seeding are also a couple of options but they will take a lot of time.
> Does anyone know an easier way to shift all my data to new setup on DC/OS?

For upgrade support from older versions of C* that did not have the hash
on the data directory, the table data dir can be just
`data_main_bim_dn_10` without the appended hash, as in your example.

Give that a quick test to see if that simplifies things for you.

-- 
Kind regards,
Michael

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Yeah, they are pretty much unique but it's only a few requests per day so
hitting all the nodes would be fine for now.

2018-04-05 15:43 GMT+02:00 Evelyn Smith :

> Not sure if it differs for SASI Secondary Indexes but my understanding is
> it’s a bad idea to use high cardinality columns for Secondary Indexes.
> Not sure what your data model looks like but I’d assume UUID would have
> very high cardinality.
>
> If that’s the case it pretty much guarantees any query on the secondary
> index will hit all the nodes, which is what you want to avoid.
>
> Also Secondary Indexes are generally bad for Cassandra, if you don’t need
> them or there's a way around using them I’d go with that.
>
> Regards,
> Eevee.
>
>
> On 5 Apr 2018, at 11:27 pm, Zsolt Pálmai  wrote:
>
> Tried both (although with the biggest table) and the result is the same.
>
> I stumbled upon this jira issue: https://issues.apache.o
> rg/jira/browse/CASSANDRA-12662
> Since the sasi indexes I use are only helping in debugging (for now) I
> dropped them and it seems the tables get compacted now (at least it made it
> further then before and the jvm metrics look healthy).
>
> Still this is not ideal as it would be nice to have those secondary
> indexes :/ .
>
> The columns I indexed are basically uuids (so I can match the rows from
> other systems but this is usually triggered manually so performance loss is
> acceptable).
> Is there a recommended index to use here? Or setting
> the max_compaction_flush_memory_in_mb value? I saw that it can cause
> different kind of problems... Or the default secondary index?
>
> Thanks
>
>
>
> 2018-04-05 15:14 GMT+02:00 Evelyn Smith :
>
>> Probably a dumb question but it’s good to clarify.
>>
>> Are you compacting the whole keyspace or are you compacting tables one at
>> a time?
>>
>>
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
>>
>> Hi!
>>
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd
>> each) and when running the nodetool compact command on any of the servers I
>> get out of memory exception after a while.
>>
>> - Before calling the compact first I did a repair and before that there
>> was a bigger update on a lot of entries so I guess a lot of sstables were
>> created. The reapir created around ~250 pending compaction tasks, 2 of the
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the
>> heap (but running the compact on them manually also killed one :/ so this
>> isn't an ideal solution)
>>
>> Some more info:
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got
>> an OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2
>> sstables attached and compacts in seconds. The rest is mostly 1 line
>> partition with a few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0,
>> 0, 0, 0, 0]
>>
>> In the metrics it looks something like this before dying:
>> https://ibb.co/kLhdXH
>>
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH
>>
>> The load is usually pretty low, the nodes are almost idling (avg 500
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100
>> writes) and the pending tasks is also around 0 usually.
>>
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
>> cause problems? I could finish some bigger compactions where there was no
>> index attached but I'm not sure 100% if this is the cause.
>>
>> Thanks,
>> Zsolt
>>
>>
>>
>>
>>
>
>


Shifting data to DCOS

2018-04-05 Thread Faraz Mateen
Hi all,

I have been spending the last few days trying to move my C* cluster on
Gcloud (3 nodes, 700GB) into a DC/OS deployment. This, as you people might
know, was not trivial.

I have finally found a way to do this migration in a time-efficient way (We
evaluated bulkloading and sstableloader, but these would take much too
long, especially if we want to repeat this process between different
deployments).

I would really appreciate if you can review my approach below and comment
on where I can do something better (or automate it using existing tools
that I might not have stumbled across).

All the data from my previous setup is on persistent disks. I created
copies of those persistent disks and attached them to DC/OS agents. When
deploying the service on DC/OS, I specified disk type as MOUNT and provided
the same cluster name as my previous setup.

After the service was successfully deployed, I logged into cqlsh. I was
able to see all the keyspaces but all the column families were missing.
When I rechecked my data directory on the persistent disk, I was able to
see all my data in different directories. Each directory has a hash
attached to its name.

For example,  if the table is *data_main_bim_dn_10*, its data directory is
named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created a new
table with the same name through cqlsh. This resulted in creation of
another directory with a different hash i.e.
data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
from the former to the latter.

Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I was
able to access all data contents through cqlsh.

Now, the problem is, I have around 500 tables and the method I mentioned
above is quite cumbersome. Bulkloading through sstableloader or remote
seeding are also a couple of options but they will take a lot of time. Does
anyone know an easier way to shift all my data to new setup on DC/OS?

-- 
Faraz Mateen


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hi, Evelyn!

I've found the following messages:

INFO RepairRunnable.java Starting repair command #41, repairing keyspace
XXX with repair options (parallelism: parallel, primary range: false,
incremental: false, job threads: 1, ColumnFamilies: [YYY], dataCenters: [],
hosts: [], # of ranges: 768)
INFO CompactionExecutor:6 CompactionManager.java Starting anticompaction
for XXX.YYY on 5132/5846 sstables

After that many similar messages go:
SSTable
BigTableReader(path='/mnt/cassandra/data/XXX/YYY-4c12fd9029e611e8810ac73ddacb37d1/lb-12688-big-Data.db')
fully contained in range (-9223372036854775808,-9223372036854775808],
mutating repairedAt instead of anticompacting

Does it means that anti-compaction is not the cause?

2018-04-05 18:01 GMT+05:00 Evelyn Smith :

> It might not be what cause it here. But check your logs for
> anti-compactions.
>
>
> On 5 Apr 2018, at 8:35 pm, Dmitry Simonov  wrote:
>
> Thank you!
> I'll check this out.
>
> 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski :
>
>> 40 pending compactions is pretty high and you should have way less than
>> that most of the time, otherwise it means that compaction is not keeping up
>> with your write rate.
>>
>> If you indeed have SSDs for data storage, increase your compaction
>> throughput to 100 or 200 (depending on how the CPUs handle the load). You
>> can experiment with compaction throughput using : nodetool
>> setcompactionthroughput 100
>>
>> You can raise the number of concurrent compactors as well and set it to a
>> value between 4 and 6 if you have at least 8 cores and CPUs aren't
>> overwhelmed.
>>
>> I'm not sure why you ended up with only one node having 6k SSTables and
>> not the others, but you should apply the above changes so that you can
>> lower the number of pending compactions and see if it prevents the issue
>> from happening again.
>>
>> Cheers,
>>
>>
>> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov 
>> wrote:
>>
>>> Hi, Alexander!
>>>
>>> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
>>> Current compaction throughput is 16 MB/s (default value).
>>>
>>> We always have about 40 pending and 2 active "CompactionExecutor" tasks
>>> in "tpstats".
>>> Mostly because of another (bigger) keyspace in this cluster.
>>> But the situation is the same on each node.
>>>
>>> According to "nodetool compactionhistory", compactions on this CF run
>>> (sometimes several times per day, sometimes one time per day, the last run
>>> was yesterday).
>>> We run "repair -full" regulary for this keyspace (every 24 hours on each
>>> node), because gc_grace_seconds is set to 24 hours.
>>>
>>> Should we consider increasing compaction throughput and
>>> "concurrent_compactors" (as recommended for SSDs) to keep
>>> "CompactionExecutor" pending tasks low?
>>>
>>> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski 
>>> :
>>>
 Hi Dmitry,

 could you tell us which compaction strategy that table is currently
 using ?
 Also, what is the compaction max throughput and is auto-compaction
 correctly enabled on that node ?

 Did you recently run repair ?

 Thanks,

 On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
 wrote:

> Hello!
>
> Could you please give some ideas on the following problem?
>
> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>
> We've recently discovered high CPU usage on one cluster node, after
> some investigation we found that number of sstables for one CF on it is
> very big: 5800 sstables, on other nodes: 3 sstable.
>
> Data size in this keyspace was not very big ~100-200Mb per node.
>
> There is no such problem with other CFs of that keyspace.
>
> nodetool compact solved the issue as a quick-fix.
>
> But I'm wondering, what was the cause? How prevent it from repeating?
>
> --
> Best Regards,
> Dmitry Simonov
>
 --
 -
 Alexander Dejanovski
 France
 @alexanderdeja

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Dmitry Simonov
>>>
>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Best Regards,
> Dmitry Simonov
>
>
>


-- 
Best Regards,
Dmitry Simonov


Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Not sure if it differs for SASI Secondary Indexes but my understanding is it’s 
a bad idea to use high cardinality columns for Secondary Indexes. 
Not sure what your data model looks like but I’d assume UUID would have very 
high cardinality.

If that’s the case it pretty much guarantees any query on the secondary index 
will hit all the nodes, which is what you want to avoid.

Also Secondary Indexes are generally bad for Cassandra, if you don’t need them 
or there's a way around using them I’d go with that.

Regards,
Eevee.

> On 5 Apr 2018, at 11:27 pm, Zsolt Pálmai  wrote:
> 
> Tried both (although with the biggest table) and the result is the same. 
> 
> I stumbled upon this jira issue: 
> https://issues.apache.org/jira/browse/CASSANDRA-12662 
> 
> Since the sasi indexes I use are only helping in debugging (for now) I 
> dropped them and it seems the tables get compacted now (at least it made it 
> further then before and the jvm metrics look healthy). 
> 
> Still this is not ideal as it would be nice to have those secondary indexes 
> :/ . 
> 
> The columns I indexed are basically uuids (so I can match the rows from other 
> systems but this is usually triggered manually so performance loss is 
> acceptable). 
> Is there a recommended index to use here? Or setting the 
> max_compaction_flush_memory_in_mb value? I saw that it can cause different 
> kind of problems... Or the default secondary index?
> 
> Thanks
> 
> 
> 
> 2018-04-05 15:14 GMT+02:00 Evelyn Smith  >:
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > > wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH 
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> 
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Tried both (although with the biggest table) and the result is the same.

I stumbled upon this jira issue: https://issues.apache.
org/jira/browse/CASSANDRA-12662
Since the sasi indexes I use are only helping in debugging (for now) I
dropped them and it seems the tables get compacted now (at least it made it
further then before and the jvm metrics look healthy).

Still this is not ideal as it would be nice to have those secondary indexes
:/ .

The columns I indexed are basically uuids (so I can match the rows from
other systems but this is usually triggered manually so performance loss is
acceptable).
Is there a recommended index to use here? Or setting
the max_compaction_flush_memory_in_mb value? I saw that it can cause
different kind of problems... Or the default secondary index?

Thanks



2018-04-05 15:14 GMT+02:00 Evelyn Smith :

> Probably a dumb question but it’s good to clarify.
>
> Are you compacting the whole keyspace or are you compacting tables one at
> a time?
>
>
> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
>
> Hi!
>
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each)
> and when running the nodetool compact command on any of the servers I get
> out of memory exception after a while.
>
> - Before calling the compact first I did a repair and before that there
> was a bigger update on a lot of entries so I guess a lot of sstables were
> created. The reapir created around ~250 pending compaction tasks, 2 of the
> nodes I managed to finish with upgrading to a 2xlarge machine and twice the
> heap (but running the compact on them manually also killed one :/ so this
> isn't an ideal solution)
>
> Some more info:
> - Version is the newest 3.11.2 with java8u116
> - Using LeveledCompactionStrategy (we have mostly reads)
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got
> an OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried
> setting throughput between 16 and 128, no changes.
> - Storage load is 127Gb/140Gb/151Gb/155Gb
> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
> - The biggest partition I found was 90Mb but that table has only 2
> sstables attached and compacts in seconds. The rest is mostly 1 line
> partition with a few 10KB of data.
> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0,
> 0, 0, 0, 0]
>
> In the metrics it looks something like this before dying:
> https://ibb.co/kLhdXH
>
> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH
>
> The load is usually pretty low, the nodes are almost idling (avg 500
> reads/sec, 30-40 writes/sec with occasional few second spikes with >100
> writes) and the pending tasks is also around 0 usually.
>
> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
> cause problems? I could finish some bigger compactions where there was no
> index attached but I'm not sure 100% if this is the cause.
>
> Thanks,
> Zsolt
>
>
>
>
>


Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Oh and second, are you attempting a major compact while you have all those 
pending compactions?

Try letting the cluster catch up on compactions. Having that many pending is 
bad.

If you have replication factor of 3 and quorum you could go node by node and 
disable binary, raise concurrent compactors to 4 and unthrottle compactions by 
setting throughput to zero. This can help it catch up on those compactions. 
Then you can deal with trying a major compaction.

Regards,
Evelyn.

> On 5 Apr 2018, at 11:14 pm, Evelyn Smith  wrote:
> 
> Probably a dumb question but it’s good to clarify.
> 
> Are you compacting the whole keyspace or are you compacting tables one at a 
> time?
> 
>> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai > > wrote:
>> 
>> Hi!
>> 
>> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
>> and when running the nodetool compact command on any of the servers I get 
>> out of memory exception after a while.
>> 
>> - Before calling the compact first I did a repair and before that there was 
>> a bigger update on a lot of entries so I guess a lot of sstables were 
>> created. The reapir created around ~250 pending compaction tasks, 2 of the 
>> nodes I managed to finish with upgrading to a 2xlarge machine and twice the 
>> heap (but running the compact on them manually also killed one :/ so this 
>> isn't an ideal solution)
>> 
>> Some more info: 
>> - Version is the newest 3.11.2 with java8u116
>> - Using LeveledCompactionStrategy (we have mostly reads)
>> - Heap size is set to 8GB
>> - Using G1GC
>> - I tried moving the memtable out of the heap. It helped but I still got an 
>> OOM last night
>> - Concurrent compactors is set to 1 but it still happens and also tried 
>> setting throughput between 16 and 128, no changes.
>> - Storage load is 127Gb/140Gb/151Gb/155Gb
>> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
>> - The biggest partition I found was 90Mb but that table has only 2 sstables 
>> attached and compacts in seconds. The rest is mostly 1 line partition with a 
>> few 10KB of data.
>> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
>> 0, 0, 0]
>> 
>> In the metrics it looks something like this before dying: 
>> https://ibb.co/kLhdXH 
>> 
>> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
>> 
>> 
>> The load is usually pretty low, the nodes are almost idling (avg 500 
>> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
>> writes) and the pending tasks is also around 0 usually.
>> 
>> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
>> cause problems? I could finish some bigger compactions where there was no 
>> index attached but I'm not sure 100% if this is the cause.
>> 
>> Thanks,
>> Zsolt
>> 
>> 
>> 
> 



Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
Probably a dumb question but it’s good to clarify.

Are you compacting the whole keyspace or are you compacting tables one at a 
time?

> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai  wrote:
> 
> Hi!
> 
> I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) 
> and when running the nodetool compact command on any of the servers I get out 
> of memory exception after a while.
> 
> - Before calling the compact first I did a repair and before that there was a 
> bigger update on a lot of entries so I guess a lot of sstables were created. 
> The reapir created around ~250 pending compaction tasks, 2 of the nodes I 
> managed to finish with upgrading to a 2xlarge machine and twice the heap (but 
> running the compact on them manually also killed one :/ so this isn't an 
> ideal solution)
> 
> Some more info: 
> - Version is the newest 3.11.2 with java8u116
> - Using LeveledCompactionStrategy (we have mostly reads)
> - Heap size is set to 8GB
> - Using G1GC
> - I tried moving the memtable out of the heap. It helped but I still got an 
> OOM last night
> - Concurrent compactors is set to 1 but it still happens and also tried 
> setting throughput between 16 and 128, no changes.
> - Storage load is 127Gb/140Gb/151Gb/155Gb
> - 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
> - The biggest partition I found was 90Mb but that table has only 2 sstables 
> attached and compacts in seconds. The rest is mostly 1 line partition with a 
> few 10KB of data.
> - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, 
> 0, 0, 0]
> 
> In the metrics it looks something like this before dying: 
> https://ibb.co/kLhdXH 
> 
> What the heap dump looks like of the top objects: https://ibb.co/ctkyXH 
> 
> 
> The load is usually pretty low, the nodes are almost idling (avg 500 
> reads/sec, 30-40 writes/sec with occasional few second spikes with >100 
> writes) and the pending tasks is also around 0 usually.
> 
> Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes 
> cause problems? I could finish some bigger compactions where there was no 
> index attached but I'm not sure 100% if this is the cause.
> 
> Thanks,
> Zsolt
> 
> 
> 



Re: Many SSTables only on one node

2018-04-05 Thread Evelyn Smith
It might not be what cause it here. But check your logs for anti-compactions.

> On 5 Apr 2018, at 8:35 pm, Dmitry Simonov  wrote:
> 
> Thank you!
> I'll check this out.
> 
> 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski  >:
> 40 pending compactions is pretty high and you should have way less than that 
> most of the time, otherwise it means that compaction is not keeping up with 
> your write rate.
> 
> If you indeed have SSDs for data storage, increase your compaction throughput 
> to 100 or 200 (depending on how the CPUs handle the load). You can experiment 
> with compaction throughput using : nodetool setcompactionthroughput 100
> 
> You can raise the number of concurrent compactors as well and set it to a 
> value between 4 and 6 if you have at least 8 cores and CPUs aren't 
> overwhelmed.
> 
> I'm not sure why you ended up with only one node having 6k SSTables and not 
> the others, but you should apply the above changes so that you can lower the 
> number of pending compactions and see if it prevents the issue from happening 
> again.
> 
> Cheers,
> 
> 
> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov  > wrote:
> Hi, Alexander!
> 
> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
> Current compaction throughput is 16 MB/s (default value).
> 
> We always have about 40 pending and 2 active "CompactionExecutor" tasks in 
> "tpstats".
> Mostly because of another (bigger) keyspace in this cluster.
> But the situation is the same on each node.
> 
> According to "nodetool compactionhistory", compactions on this CF run 
> (sometimes several times per day, sometimes one time per day, the last run 
> was yesterday).
> We run "repair -full" regulary for this keyspace (every 24 hours on each 
> node), because gc_grace_seconds is set to 24 hours.
> 
> Should we consider increasing compaction throughput and 
> "concurrent_compactors" (as recommended for SSDs) to keep 
> "CompactionExecutor" pending tasks low?
> 
> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski  >:
> Hi Dmitry,
> 
> could you tell us which compaction strategy that table is currently using ?
> Also, what is the compaction max throughput and is auto-compaction correctly 
> enabled on that node ?
> 
> Did you recently run repair ?
> 
> Thanks,
> 
> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov  > wrote:
> Hello!
> 
> Could you please give some ideas on the following problem?
> 
> We have a cluster with 3 nodes, running Cassandra 2.2.11.
> 
> We've recently discovered high CPU usage on one cluster node, after some 
> investigation we found that number of sstables for one CF on it is very big: 
> 5800 sstables, on other nodes: 3 sstable.
> 
> Data size in this keyspace was not very big ~100-200Mb per node.
> 
> There is no such problem with other CFs of that keyspace.
> 
> nodetool compact solved the issue as a quick-fix.
> 
> But I'm wondering, what was the cause? How prevent it from repeating?
> 
> -- 
> Best Regards,
> Dmitry Simonov
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 
> 
> 
> -- 
> Best Regards,
> Dmitry Simonov
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 
> 
> 
> -- 
> Best Regards,
> Dmitry Simonov



OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Hi!

I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each)
and when running the nodetool compact command on any of the servers I get
out of memory exception after a while.

- Before calling the compact first I did a repair and before that there was
a bigger update on a lot of entries so I guess a lot of sstables were
created. The reapir created around ~250 pending compaction tasks, 2 of the
nodes I managed to finish with upgrading to a 2xlarge machine and twice the
heap (but running the compact on them manually also killed one :/ so this
isn't an ideal solution)

Some more info:
- Version is the newest 3.11.2 with java8u116
- Using LeveledCompactionStrategy (we have mostly reads)
- Heap size is set to 8GB
- Using G1GC
- I tried moving the memtable out of the heap. It helped but I still got an
OOM last night
- Concurrent compactors is set to 1 but it still happens and also tried
setting throughput between 16 and 128, no changes.
- Storage load is 127Gb/140Gb/151Gb/155Gb
- 1 keyspace, 16 tables but there are a few SASI indexes on big tables.
- The biggest partition I found was 90Mb but that table has only 2 sstables
attached and compacts in seconds. The rest is mostly 1 line partition with
a few 10KB of data.
- Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0,
0, 0, 0]

In the metrics it looks something like this before dying:
https://ibb.co/kLhdXH

What the heap dump looks like of the top objects: https://ibb.co/ctkyXH

The load is usually pretty low, the nodes are almost idling (avg 500
reads/sec, 30-40 writes/sec with occasional few second spikes with >100
writes) and the pending tasks is also around 0 usually.

Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes
cause problems? I could finish some bigger compactions where there was no
index attached but I'm not sure 100% if this is the cause.

Thanks,
Zsolt


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Thank you!
I'll check this out.

2018-04-05 15:00 GMT+05:00 Alexander Dejanovski :

> 40 pending compactions is pretty high and you should have way less than
> that most of the time, otherwise it means that compaction is not keeping up
> with your write rate.
>
> If you indeed have SSDs for data storage, increase your compaction
> throughput to 100 or 200 (depending on how the CPUs handle the load). You
> can experiment with compaction throughput using : nodetool
> setcompactionthroughput 100
>
> You can raise the number of concurrent compactors as well and set it to a
> value between 4 and 6 if you have at least 8 cores and CPUs aren't
> overwhelmed.
>
> I'm not sure why you ended up with only one node having 6k SSTables and
> not the others, but you should apply the above changes so that you can
> lower the number of pending compactions and see if it prevents the issue
> from happening again.
>
> Cheers,
>
>
> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov 
> wrote:
>
>> Hi, Alexander!
>>
>> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
>> Current compaction throughput is 16 MB/s (default value).
>>
>> We always have about 40 pending and 2 active "CompactionExecutor" tasks
>> in "tpstats".
>> Mostly because of another (bigger) keyspace in this cluster.
>> But the situation is the same on each node.
>>
>> According to "nodetool compactionhistory", compactions on this CF run
>> (sometimes several times per day, sometimes one time per day, the last run
>> was yesterday).
>> We run "repair -full" regulary for this keyspace (every 24 hours on each
>> node), because gc_grace_seconds is set to 24 hours.
>>
>> Should we consider increasing compaction throughput and
>> "concurrent_compactors" (as recommended for SSDs) to keep
>> "CompactionExecutor" pending tasks low?
>>
>> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski :
>>
>>> Hi Dmitry,
>>>
>>> could you tell us which compaction strategy that table is currently
>>> using ?
>>> Also, what is the compaction max throughput and is auto-compaction
>>> correctly enabled on that node ?
>>>
>>> Did you recently run repair ?
>>>
>>> Thanks,
>>>
>>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
>>> wrote:
>>>
 Hello!

 Could you please give some ideas on the following problem?

 We have a cluster with 3 nodes, running Cassandra 2.2.11.

 We've recently discovered high CPU usage on one cluster node, after
 some investigation we found that number of sstables for one CF on it is
 very big: 5800 sstables, on other nodes: 3 sstable.

 Data size in this keyspace was not very big ~100-200Mb per node.

 There is no such problem with other CFs of that keyspace.

 nodetool compact solved the issue as a quick-fix.

 But I'm wondering, what was the cause? How prevent it from repeating?

 --
 Best Regards,
 Dmitry Simonov

>>> --
>>> -
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best Regards,
Dmitry Simonov


Re: Many SSTables only on one node

2018-04-05 Thread Alexander Dejanovski
40 pending compactions is pretty high and you should have way less than
that most of the time, otherwise it means that compaction is not keeping up
with your write rate.

If you indeed have SSDs for data storage, increase your compaction
throughput to 100 or 200 (depending on how the CPUs handle the load). You
can experiment with compaction throughput using : nodetool
setcompactionthroughput 100

You can raise the number of concurrent compactors as well and set it to a
value between 4 and 6 if you have at least 8 cores and CPUs aren't
overwhelmed.

I'm not sure why you ended up with only one node having 6k SSTables and not
the others, but you should apply the above changes so that you can lower
the number of pending compactions and see if it prevents the issue from
happening again.

Cheers,


On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov 
wrote:

> Hi, Alexander!
>
> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
> Current compaction throughput is 16 MB/s (default value).
>
> We always have about 40 pending and 2 active "CompactionExecutor" tasks in
> "tpstats".
> Mostly because of another (bigger) keyspace in this cluster.
> But the situation is the same on each node.
>
> According to "nodetool compactionhistory", compactions on this CF run
> (sometimes several times per day, sometimes one time per day, the last run
> was yesterday).
> We run "repair -full" regulary for this keyspace (every 24 hours on each
> node), because gc_grace_seconds is set to 24 hours.
>
> Should we consider increasing compaction throughput and
> "concurrent_compactors" (as recommended for SSDs) to keep
> "CompactionExecutor" pending tasks low?
>
> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski :
>
>> Hi Dmitry,
>>
>> could you tell us which compaction strategy that table is currently using
>> ?
>> Also, what is the compaction max throughput and is auto-compaction
>> correctly enabled on that node ?
>>
>> Did you recently run repair ?
>>
>> Thanks,
>>
>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
>> wrote:
>>
>>> Hello!
>>>
>>> Could you please give some ideas on the following problem?
>>>
>>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>>
>>> We've recently discovered high CPU usage on one cluster node, after some
>>> investigation we found that number of sstables for one CF on it is very
>>> big: 5800 sstables, on other nodes: 3 sstable.
>>>
>>> Data size in this keyspace was not very big ~100-200Mb per node.
>>>
>>> There is no such problem with other CFs of that keyspace.
>>>
>>> nodetool compact solved the issue as a quick-fix.
>>>
>>> But I'm wondering, what was the cause? How prevent it from repeating?
>>>
>>> --
>>> Best Regards,
>>> Dmitry Simonov
>>>
>> --
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
>
> --
> Best Regards,
> Dmitry Simonov
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hi, Alexander!

SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
Current compaction throughput is 16 MB/s (default value).

We always have about 40 pending and 2 active "CompactionExecutor" tasks in
"tpstats".
Mostly because of another (bigger) keyspace in this cluster.
But the situation is the same on each node.

According to "nodetool compactionhistory", compactions on this CF run
(sometimes several times per day, sometimes one time per day, the last run
was yesterday).
We run "repair -full" regulary for this keyspace (every 24 hours on each
node), because gc_grace_seconds is set to 24 hours.

Should we consider increasing compaction throughput and
"concurrent_compactors" (as recommended for SSDs) to keep
"CompactionExecutor" pending tasks low?

2018-04-05 14:09 GMT+05:00 Alexander Dejanovski :

> Hi Dmitry,
>
> could you tell us which compaction strategy that table is currently using ?
> Also, what is the compaction max throughput and is auto-compaction
> correctly enabled on that node ?
>
> Did you recently run repair ?
>
> Thanks,
>
> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
> wrote:
>
>> Hello!
>>
>> Could you please give some ideas on the following problem?
>>
>> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>>
>> We've recently discovered high CPU usage on one cluster node, after some
>> investigation we found that number of sstables for one CF on it is very
>> big: 5800 sstables, on other nodes: 3 sstable.
>>
>> Data size in this keyspace was not very big ~100-200Mb per node.
>>
>> There is no such problem with other CFs of that keyspace.
>>
>> nodetool compact solved the issue as a quick-fix.
>>
>> But I'm wondering, what was the cause? How prevent it from repeating?
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Best Regards,
Dmitry Simonov


Re: Many SSTables only on one node

2018-04-05 Thread Alexander Dejanovski
Hi Dmitry,

could you tell us which compaction strategy that table is currently using ?
Also, what is the compaction max throughput and is auto-compaction
correctly enabled on that node ?

Did you recently run repair ?

Thanks,

On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov 
wrote:

> Hello!
>
> Could you please give some ideas on the following problem?
>
> We have a cluster with 3 nodes, running Cassandra 2.2.11.
>
> We've recently discovered high CPU usage on one cluster node, after some
> investigation we found that number of sstables for one CF on it is very
> big: 5800 sstables, on other nodes: 3 sstable.
>
> Data size in this keyspace was not very big ~100-200Mb per node.
>
> There is no such problem with other CFs of that keyspace.
>
> nodetool compact solved the issue as a quick-fix.
>
> But I'm wondering, what was the cause? How prevent it from repeating?
>
> --
> Best Regards,
> Dmitry Simonov
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Many SSTables only on one node

2018-04-05 Thread Dmitry Simonov
Hello!

Could you please give some ideas on the following problem?

We have a cluster with 3 nodes, running Cassandra 2.2.11.

We've recently discovered high CPU usage on one cluster node, after some
investigation we found that number of sstables for one CF on it is very
big: 5800 sstables, on other nodes: 3 sstable.

Data size in this keyspace was not very big ~100-200Mb per node.

There is no such problem with other CFs of that keyspace.

nodetool compact solved the issue as a quick-fix.

But I'm wondering, what was the cause? How prevent it from repeating?

-- 
Best Regards,
Dmitry Simonov