Probably a dumb question but it’s good to clarify. Are you compacting the whole keyspace or are you compacting tables one at a time?
> On 5 Apr 2018, at 9:47 pm, Zsolt Pálmai <zpal...@gmail.com> wrote: > > Hi! > > I have a setup with 4 AWS nodes (m4xlarge - 4 cpu, 16gb ram, 1TB ssd each) > and when running the nodetool compact command on any of the servers I get out > of memory exception after a while. > > - Before calling the compact first I did a repair and before that there was a > bigger update on a lot of entries so I guess a lot of sstables were created. > The reapir created around ~250 pending compaction tasks, 2 of the nodes I > managed to finish with upgrading to a 2xlarge machine and twice the heap (but > running the compact on them manually also killed one :/ so this isn't an > ideal solution) > > Some more info: > - Version is the newest 3.11.2 with java8u116 > - Using LeveledCompactionStrategy (we have mostly reads) > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got an > OOM last night > - Concurrent compactors is set to 1 but it still happens and also tried > setting throughput between 16 and 128, no changes. > - Storage load is 127Gb/140Gb/151Gb/155Gb > - 1 keyspace, 16 tables but there are a few SASI indexes on big tables. > - The biggest partition I found was 90Mb but that table has only 2 sstables > attached and compacts in seconds. The rest is mostly 1 line partition with a > few 10KB of data. > - Worst SSTable case: SSTables in each level: [1, 20/10, 106/100, 15, 0, 0, > 0, 0, 0] > > In the metrics it looks something like this before dying: > https://ibb.co/kLhdXH <https://ibb.co/kLhdXH> > > What the heap dump looks like of the top objects: https://ibb.co/ctkyXH > <https://ibb.co/ctkyXH> > > The load is usually pretty low, the nodes are almost idling (avg 500 > reads/sec, 30-40 writes/sec with occasional few second spikes with >100 > writes) and the pending tasks is also around 0 usually. > > Any ideas? I'm starting to run out of ideas. Maybe the secondary indexes > cause problems? I could finish some bigger compactions where there was no > index attached but I'm not sure 100% if this is the cause. > > Thanks, > Zsolt > > >