Re: SSTable count at 10K during repair (won't decrease)
Are you using repairParallelism = sequential or parallel ? As said by Alain: - try to decrease streamthroughput to avoid overflooding nodes with a lots of (small) streamed sstables - if you are using // repair, switch to sequential - don't start too much repair simultaneously. - Do you really need to use LCS for your tables ? LCS make the problem even worse. Use it with parcimony ;) 2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin: > - Cassandra 2.1.13 > - SSDs > - LeveledCompactionStrategy > - Range repair (not incremental) with Spotify's Reaper > https://github.com/spotify/cassandra-reaper > > Problem : When we run a repair job sometimes the SSTable count goes to 10K > on one of nodes (not always the same node). The Reaper is smart enough to > postpone the repair on this node since the number of pending compactions is > > 20 but number of SSTables stays around 10K. > Even If I set the compactionthroughput 0 (disable throttling) the SSTable > count stays around 10K. > > Workaround: If we abort the repair, and restart the node it quickly (in 15 > minutes) goes back to 200 SSTables ... > > Any suggestions as what I should look at ? > > When it occurs, I've noticed that nodetool compactionstats and cfstats (on > the table with 10K SSTables) takes minutes to return with a result. > > I thought that the issue might be related to > https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the > MemtablePostFlush waiting on the countdown latch but the Pending > MemtablePostFlush is going up and down according to tpstats. > > Complete stack trace : http://pastebin.com/K1r3CUff > > I took some tpstats (roughly every minutes). Only these pools are not at 0 > (Active/Pending). > > Pool NameActive Pending Completed Blocked > All time blocked > MemtableFlushWriter 2 2 139864 0 > 0 > MemtablePostFlush 113 223714 0 > 0 > CompactionExecutor 1010 804964 0 > 0 > > MemtableFlushWriter 4 4 139889 0 > 0 > MemtablePostFlush 112 223744 0 > 0 > CompactionExecutor 1212 805365 0 > 0 > > MemtableFlushWriter 5 5 139896 0 > 0 > MemtablePostFlush 110 223755 0 > 0 > CompactionExecutor9 9 805503 0 > 0 > > MemtableFlushWriter 4 4 139907 0 > 0 > MemtablePostFlush 113 223762 0 > 0 > CompactionExecutor9 9 805703 0 > 0 > > > MemtableFlushWriter 5 5 139927 0 > 0 > MemtablePostFlush 114 223783 0 > 0 > CompactionExecutor 1010 805971 0 > 0 > > MemtableFlushWriter 7 7 139956 0 > 0 > MemtablePostFlush 123 223806 0 > 0 > CompactionExecutor 1010 806428 0 > 0 > > nodetool compactionstats shows pending tasks 66 > > Keyspace: foo > Read Count: 6308735 > Read Latency: 12.132909585836147 ms. > Write Count: 15394697 > Write Latency: 0.09054346675351908 ms. > Pending Flushes: 15 > Table: bar > SSTable count: 10326 > SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0, > 0, 0, 0] > Space used (live): 69204087872 > Space used (total): 69206400092 > Space used by snapshots (total): 2708047105 > Off heap memory used (total): 35230672 > SSTable Compression Ratio: 0.339043411676821 > Number of keys (estimate): 1601158 > Memtable cell count: 86524 > Memtable data size: 6508214 > Memtable off heap memory used: 0 > Memtable switch count: 22719 > Local read count: 6310549 > Local read latency: 12.135 ms > Local write count: 15397653 > Local write latency: 0.091 ms > Pending flushes: 10 > Bloom filter false positives: 2282107 > Bloom filter false ratio: 0.38494 > Bloom filter space used: 3244792 > Bloom filter off heap memory used: 3162168 > Index summary off heap memory used: 3348360 > Compression metadata off heap memory used: 28720144 > Compacted partition minimum
Re: SSTable count at 10K during repair (won't decrease)
Thanks for the detailed information that definitely deserves an answer, even a bit late. Any suggestions as what I should look at ? Is the node receiving a lot of streams during the repair ? What does this output? 'nodetool netstats -H' or 'nodetool netstats -H | grep -v 100%' and 'iftop' This node might be receiving data faster than it can handle it. I opened an issue about this topic a while ago, but never worked to fix it and it was never picked up either https://issues.apache.org/jira/browse/CASSANDRA-9509. Basically defaults value might not work for you. You can try reducing the streaming speed if this is still happening: nodetool setstreamthroughput X. You can try any value here from 1 to your current value. It highly depends on your network / disk throughputs. This need to be applied on all the other node (I use to apply it to all the node at once through https://github.com/arodrime/cassandra-tools/tree/master/rolling-ssh) Even If I set the compactionthroughput 0 (disable throttling) the SSTable > count stays around 10K. > Using SSD, if nodes are still happy this way, feel free to keep this throttle disabled. I saw many people on this list complaining of this issue. If you think it streaming is not causing this, you might want to look at Jira for this issue or create a new ticket. C*heers, --- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin: > - Cassandra 2.1.13 > - SSDs > - LeveledCompactionStrategy > - Range repair (not incremental) with Spotify's Reaper > https://github.com/spotify/cassandra-reaper > > Problem : When we run a repair job sometimes the SSTable count goes to 10K > on one of nodes (not always the same node). The Reaper is smart enough to > postpone the repair on this node since the number of pending compactions is > > 20 but number of SSTables stays around 10K. > Even If I set the compactionthroughput 0 (disable throttling) the SSTable > count stays around 10K. > > Workaround: If we abort the repair, and restart the node it quickly (in 15 > minutes) goes back to 200 SSTables ... > > Any suggestions as what I should look at ? > > When it occurs, I've noticed that nodetool compactionstats and cfstats (on > the table with 10K SSTables) takes minutes to return with a result. > > I thought that the issue might be related to > https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the > MemtablePostFlush waiting on the countdown latch but the Pending > MemtablePostFlush is going up and down according to tpstats. > > Complete stack trace : http://pastebin.com/K1r3CUff > > I took some tpstats (roughly every minutes). Only these pools are not at 0 > (Active/Pending). > > Pool NameActive Pending Completed Blocked > All time blocked > MemtableFlushWriter 2 2 139864 0 > 0 > MemtablePostFlush 113 223714 0 > 0 > CompactionExecutor 1010 804964 0 > 0 > > MemtableFlushWriter 4 4 139889 0 > 0 > MemtablePostFlush 112 223744 0 > 0 > CompactionExecutor 1212 805365 0 > 0 > > MemtableFlushWriter 5 5 139896 0 > 0 > MemtablePostFlush 110 223755 0 > 0 > CompactionExecutor9 9 805503 0 > 0 > > MemtableFlushWriter 4 4 139907 0 > 0 > MemtablePostFlush 113 223762 0 > 0 > CompactionExecutor9 9 805703 0 > 0 > > > MemtableFlushWriter 5 5 139927 0 > 0 > MemtablePostFlush 114 223783 0 > 0 > CompactionExecutor 1010 805971 0 > 0 > > MemtableFlushWriter 7 7 139956 0 > 0 > MemtablePostFlush 123 223806 0 > 0 > CompactionExecutor 1010 806428 0 > 0 > > nodetool compactionstats shows pending tasks 66 > > Keyspace: foo > Read Count: 6308735 > Read Latency: 12.132909585836147 ms. > Write Count: 15394697 > Write Latency: 0.09054346675351908 ms. > Pending Flushes: 15 > Table: bar > SSTable count: 10326 > SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0, > 0, 0, 0] > Space used (live): 69204087872 >
SSTable count at 10K during repair (won't decrease)
- Cassandra 2.1.13 - SSDs - LeveledCompactionStrategy - Range repair (not incremental) with Spotify's Reaper https://github.com/spotify/cassandra-reaper Problem : When we run a repair job sometimes the SSTable count goes to 10K on one of nodes (not always the same node). The Reaper is smart enough to postpone the repair on this node since the number of pending compactions is > 20 but number of SSTables stays around 10K. Even If I set the compactionthroughput 0 (disable throttling) the SSTable count stays around 10K. Workaround: If we abort the repair, and restart the node it quickly (in 15 minutes) goes back to 200 SSTables ... Any suggestions as what I should look at ? When it occurs, I've noticed that nodetool compactionstats and cfstats (on the table with 10K SSTables) takes minutes to return with a result. I thought that the issue might be related to https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the MemtablePostFlush waiting on the countdown latch but the Pending MemtablePostFlush is going up and down according to tpstats. Complete stack trace : http://pastebin.com/K1r3CUff I took some tpstats (roughly every minutes). Only these pools are not at 0 (Active/Pending). Pool Name Active Pending Completed Blocked All time blocked MemtableFlushWriter 2 2 139864 0 0 MemtablePostFlush 1 13 223714 0 0 CompactionExecutor 10 10 804964 0 0 MemtableFlushWriter 4 4 139889 0 0 MemtablePostFlush 1 12 223744 0 0 CompactionExecutor 12 12 805365 0 0 MemtableFlushWriter 5 5 139896 0 0 MemtablePostFlush 1 10 223755 0 0 CompactionExecutor 9 9 805503 0 0 MemtableFlushWriter 4 4 139907 0 0 MemtablePostFlush 1 13 223762 0 0 CompactionExecutor 9 9 805703 0 0 MemtableFlushWriter 5 5 139927 0 0 MemtablePostFlush 1 14 223783 0 0 CompactionExecutor 10 10 805971 0 0 MemtableFlushWriter 7 7 139956 0 0 MemtablePostFlush 1 23 223806 0 0 CompactionExecutor 10 10 806428 0 0 nodetool compactionstats shows pending tasks 66 Keyspace: foo Read Count: 6308735 Read Latency: 12.132909585836147 ms. Write Count: 15394697 Write Latency: 0.09054346675351908 ms. Pending Flushes: 15 Table: bar SSTable count: 10326 SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0, 0, 0, 0] Space used (live): 69204087872 Space used (total): 69206400092 Space used by snapshots (total): 2708047105 Off heap memory used (total): 35230672 SSTable Compression Ratio: 0.339043411676821 Number of keys (estimate): 1601158 Memtable cell count: 86524 Memtable data size: 6508214 Memtable off heap memory used: 0 Memtable switch count: 22719 Local read count: 6310549 Local read latency: 12.135 ms Local write count: 15397653 Local write latency: 0.091 ms Pending flushes: 10 Bloom filter false positives: 2282107 Bloom filter false ratio: 0.38494 Bloom filter space used: 3244792 Bloom filter off heap memory used: 3162168 Index summary off heap memory used: 3348360 Compression metadata off heap memory used: 28720144 Compacted partition minimum bytes: 87 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 69860 Average live cells per slice (last five minutes): 817.6059838850788 Maximum live cells per slice (last five minutes): 5002.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 Thanks J-F Gosselin