Re: SSTable count at 10K during repair (won't decrease)

2016-05-20 Thread Fabrice Facorat
Are you using repairParallelism = sequential or parallel ?

As said by Alain:
- try to decrease streamthroughput to avoid overflooding nodes with a lots
of (small) streamed sstables
- if you are using // repair, switch to sequential
- don't start too much repair simultaneously.
- Do you really need to use LCS for your tables ? LCS make the problem even
worse. Use it with parcimony ;)



2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin :

> - Cassandra 2.1.13
> - SSDs
> - LeveledCompactionStrategy
> - Range repair (not incremental) with Spotify's Reaper
> https://github.com/spotify/cassandra-reaper
>
> Problem : When we run a repair job sometimes the SSTable count goes to 10K
> on one of nodes (not always the same node). The Reaper is smart enough to
> postpone the repair on this node since the number of pending compactions is
> > 20 but number of SSTables stays around 10K.
> Even If I set the compactionthroughput 0 (disable throttling) the SSTable
> count stays around 10K.
>
> Workaround: If we abort the repair, and restart the node it quickly (in 15
> minutes) goes back to 200 SSTables ...
>
> Any suggestions as what I should look at ?
>
> When it occurs, I've noticed that nodetool compactionstats and cfstats (on
> the table with 10K SSTables) takes minutes to return with a result.
>
> I thought that the issue might be related to
> https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the
> MemtablePostFlush waiting on the countdown latch but the Pending
> MemtablePostFlush is going up and down according to tpstats.
>
> Complete stack trace : http://pastebin.com/K1r3CUff
>
> I took some tpstats (roughly every minutes). Only these pools are not at 0
> (Active/Pending).
>
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MemtableFlushWriter   2 2 139864 0
> 0
> MemtablePostFlush 113 223714 0
> 0
> CompactionExecutor   1010 804964 0
> 0
>
> MemtableFlushWriter   4 4 139889 0
> 0
> MemtablePostFlush 112 223744 0
> 0
> CompactionExecutor   1212 805365 0
> 0
>
> MemtableFlushWriter   5 5 139896 0
> 0
> MemtablePostFlush 110 223755 0
> 0
> CompactionExecutor9 9 805503 0
> 0
>
> MemtableFlushWriter   4 4 139907 0
> 0
> MemtablePostFlush 113 223762 0
> 0
> CompactionExecutor9 9 805703 0
> 0
>
>
> MemtableFlushWriter   5 5 139927 0
> 0
> MemtablePostFlush 114 223783 0
> 0
> CompactionExecutor   1010 805971 0
> 0
>
> MemtableFlushWriter   7 7 139956 0
> 0
> MemtablePostFlush 123 223806 0
> 0
> CompactionExecutor   1010 806428 0
> 0
>
> nodetool compactionstats shows pending tasks 66
>
> Keyspace: foo
> Read Count: 6308735
> Read Latency: 12.132909585836147 ms.
> Write Count: 15394697
> Write Latency: 0.09054346675351908 ms.
> Pending Flushes: 15
> Table: bar
> SSTable count: 10326
> SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0,
> 0, 0, 0]
> Space used (live): 69204087872
> Space used (total): 69206400092
> Space used by snapshots (total): 2708047105
> Off heap memory used (total): 35230672
> SSTable Compression Ratio: 0.339043411676821
> Number of keys (estimate): 1601158
> Memtable cell count: 86524
> Memtable data size: 6508214
> Memtable off heap memory used: 0
> Memtable switch count: 22719
> Local read count: 6310549
> Local read latency: 12.135 ms
> Local write count: 15397653
> Local write latency: 0.091 ms
> Pending flushes: 10
> Bloom filter false positives: 2282107
> Bloom filter false ratio: 0.38494
> Bloom filter space used: 3244792
> Bloom filter off heap memory used: 3162168
> Index summary off heap memory used: 3348360
> Compression metadata off heap memory used: 28720144
> Compacted partition minimum 

Re: SSTable count at 10K during repair (won't decrease)

2016-05-20 Thread Alain RODRIGUEZ
Thanks for the detailed information that definitely deserves an answer,
even a bit late.

Any suggestions as what I should look at ?


Is the node receiving a lot of streams during the repair ? What does this
output?

'nodetool netstats -H'
or
'nodetool netstats -H | grep -v 100%'

and

'iftop'

This node might be receiving data faster than it can handle it. I opened an
issue about this topic a while ago, but never worked to fix it and it was
never picked up either https://issues.apache.org/jira/browse/CASSANDRA-9509.

Basically defaults value might not work for you. You can try reducing the
streaming speed if this is still happening:

nodetool setstreamthroughput X. You can try any value here from 1 to your
current value. It highly depends on your network / disk throughputs.

This need to be applied on all the other node (I use to apply it to all the
node at once through
https://github.com/arodrime/cassandra-tools/tree/master/rolling-ssh)

Even If I set the compactionthroughput 0 (disable throttling) the SSTable
> count stays around 10K.
>

Using SSD, if nodes are still happy this way, feel free to keep this
throttle disabled.

I saw many people on this list complaining of this issue. If you think it
streaming is not causing this, you might want to look at Jira for this
issue or create a new ticket.

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin :

> - Cassandra 2.1.13
> - SSDs
> - LeveledCompactionStrategy
> - Range repair (not incremental) with Spotify's Reaper
> https://github.com/spotify/cassandra-reaper
>
> Problem : When we run a repair job sometimes the SSTable count goes to 10K
> on one of nodes (not always the same node). The Reaper is smart enough to
> postpone the repair on this node since the number of pending compactions is
> > 20 but number of SSTables stays around 10K.
> Even If I set the compactionthroughput 0 (disable throttling) the SSTable
> count stays around 10K.
>
> Workaround: If we abort the repair, and restart the node it quickly (in 15
> minutes) goes back to 200 SSTables ...
>
> Any suggestions as what I should look at ?
>
> When it occurs, I've noticed that nodetool compactionstats and cfstats (on
> the table with 10K SSTables) takes minutes to return with a result.
>
> I thought that the issue might be related to
> https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the
> MemtablePostFlush waiting on the countdown latch but the Pending
> MemtablePostFlush is going up and down according to tpstats.
>
> Complete stack trace : http://pastebin.com/K1r3CUff
>
> I took some tpstats (roughly every minutes). Only these pools are not at 0
> (Active/Pending).
>
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MemtableFlushWriter   2 2 139864 0
> 0
> MemtablePostFlush 113 223714 0
> 0
> CompactionExecutor   1010 804964 0
> 0
>
> MemtableFlushWriter   4 4 139889 0
> 0
> MemtablePostFlush 112 223744 0
> 0
> CompactionExecutor   1212 805365 0
> 0
>
> MemtableFlushWriter   5 5 139896 0
> 0
> MemtablePostFlush 110 223755 0
> 0
> CompactionExecutor9 9 805503 0
> 0
>
> MemtableFlushWriter   4 4 139907 0
> 0
> MemtablePostFlush 113 223762 0
> 0
> CompactionExecutor9 9 805703 0
> 0
>
>
> MemtableFlushWriter   5 5 139927 0
> 0
> MemtablePostFlush 114 223783 0
> 0
> CompactionExecutor   1010 805971 0
> 0
>
> MemtableFlushWriter   7 7 139956 0
> 0
> MemtablePostFlush 123 223806 0
> 0
> CompactionExecutor   1010 806428 0
> 0
>
> nodetool compactionstats shows pending tasks 66
>
> Keyspace: foo
> Read Count: 6308735
> Read Latency: 12.132909585836147 ms.
> Write Count: 15394697
> Write Latency: 0.09054346675351908 ms.
> Pending Flushes: 15
> Table: bar
> SSTable count: 10326
> SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0,
> 0, 0, 0]
> Space used (live): 69204087872
>   

SSTable count at 10K during repair (won't decrease)

2016-05-06 Thread Jean-Francois Gosselin
- Cassandra 2.1.13 
- SSDs
- LeveledCompactionStrategy   
- Range repair (not incremental) with Spotify's Reaper 
https://github.com/spotify/cassandra-reaper

Problem : When we run a repair job sometimes the SSTable count goes to 10K on 
one of nodes (not always the same node). The Reaper is smart enough to postpone 
the repair on this node since the number of pending compactions is > 20 but 
number of SSTables stays around 10K.
Even If I set the compactionthroughput 0 (disable throttling) the SSTable count 
stays around 10K. 

Workaround: If we abort the repair, and restart the node it quickly (in 15 
minutes) goes back to 200 SSTables ...

Any suggestions as what I should look at ?

When it occurs, I've noticed that nodetool compactionstats and cfstats (on the 
table with 10K SSTables) takes minutes to return with a result.

I thought that the issue might be related to 
https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the 
MemtablePostFlush waiting on the countdown latch but the Pending 
MemtablePostFlush is going up and down according to tpstats.

Complete stack trace : http://pastebin.com/K1r3CUff

I took some tpstats (roughly every minutes). Only these pools are not at 0 
(Active/Pending). 

Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked
MemtableFlushWriter               2         2         139864         0          
       0
MemtablePostFlush                 1        13         223714         0          
       0
CompactionExecutor               10        10         804964         0          
       0

MemtableFlushWriter               4         4         139889         0          
       0
MemtablePostFlush                 1        12         223744         0          
       0
CompactionExecutor               12        12         805365         0          
       0

MemtableFlushWriter               5         5         139896         0          
       0
MemtablePostFlush                 1        10         223755         0          
       0
CompactionExecutor                9         9         805503         0          
       0

MemtableFlushWriter               4         4         139907         0          
       0
MemtablePostFlush                 1        13         223762         0          
       0
CompactionExecutor                9         9         805703         0          
       0


MemtableFlushWriter               5         5         139927         0          
       0
MemtablePostFlush                 1        14         223783         0          
       0
CompactionExecutor               10        10         805971         0          
       0

MemtableFlushWriter               7         7         139956         0          
       0
MemtablePostFlush                 1        23         223806         0          
       0
CompactionExecutor               10        10         806428         0          
       0

nodetool compactionstats shows pending tasks 66

Keyspace: foo
        Read Count: 6308735
        Read Latency: 12.132909585836147 ms.
        Write Count: 15394697
        Write Latency: 0.09054346675351908 ms.
        Pending Flushes: 15
                Table: bar
                SSTable count: 10326
                SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0, 0, 0, 
0]
                Space used (live): 69204087872
                Space used (total): 69206400092
                Space used by snapshots (total): 2708047105
                Off heap memory used (total): 35230672
                SSTable Compression Ratio: 0.339043411676821
                Number of keys (estimate): 1601158
                Memtable cell count: 86524
                Memtable data size: 6508214
                Memtable off heap memory used: 0
                Memtable switch count: 22719
                Local read count: 6310549
                Local read latency: 12.135 ms
                Local write count: 15397653
                Local write latency: 0.091 ms
                Pending flushes: 10
                Bloom filter false positives: 2282107
                Bloom filter false ratio: 0.38494
                Bloom filter space used: 3244792
                Bloom filter off heap memory used: 3162168
                Index summary off heap memory used: 3348360
                Compression metadata off heap memory used: 28720144
                Compacted partition minimum bytes: 87
                Compacted partition maximum bytes: 2816159
                Compacted partition mean bytes: 69860
                Average live cells per slice (last five minutes): 
817.6059838850788
                Maximum live cells per slice (last five minutes): 5002.0
                Average tombstones per slice (last five minutes): 0.0
                Maximum tombstones per slice (last five minutes): 0.0

Thanks

J-F Gosselin