I can vouch for TWCS...we switched from DTCS to TWCS using Jeff's plugin w/ Cassandra 3.0.5 and just upgraded to 3.0.8 today and switched over to the built-in version of TWCS.
-J On Mon, Jul 11, 2016 at 1:38 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > DTCS is deprecated in favor of TWCS in new versions, yes. > > > > Worth mentioning that you can NOT disable blocking read repair which comes > naturally if you use CL > ONE. > > > > > Also instead of major compactions (which comes with its set of issues > / tradeoffs too) you can think of a script smartly using sstablemetadata to > find the sstables holding too much tombstones and running single SSTable > compactions on them through JMX and user defined compactions. Meanwhile if > you want to do it manually, you could do it with something like this to > know the tombstone ratio from the biggest sstable: > > > > The tombstone compaction options basically do this for you for the right > settings (unchecked tombstone compaction = true, set threshold to 85% or > so, don’t try to get clever and set it to something very close to 99%, the > estimated tombstone ratio isn’t that accurate) > > > > - Jeff > > > > > > *From: *Alain RODRIGUEZ <arodr...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, July 11, 2016 at 1:05 PM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *Re: DTCS SSTable count issue > > > > @Jeff > > > > Rather than being an alternative, isn't your compaction strategy going to > deprecate (and finally replace) DTCS ? That was my understanding from the > ticket CASSANDRA-9666. > > @Riccardo > > > > If you are interested in TWCS from Jeff, I believe it has been introduced > in 3.0.8 actually, not 3.0.7 > https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt#L28 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_cassandra-2D3.0_CHANGES.txt-23L28&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=AqctrVapUKAr-AuBiB520RaDRjkh0YQcR-Ze4CPQWIw&e=>. > Anyway, you can use it in any recent version as compactions strategies are > pluggable. > > > > What concerns me is that I have an high tombstone read count despite those > are insert only tables. Compacting the table make the tombstone issue > disappear. Yes, we are using TTL to expire data after 3 months and I have > not touch the GC grace period. > > > > I observed the same issue recently and I am confident that TWCS will solve > this tombstone issue, but it is not tested on my side so far. Meanwhile, be > sure you have disabled any "read repair" on tables using DTCS and maybe > hints as well. It is a hard decision to take as you'll loose 2 out of 3 > anti entropy systems, but DTCS behaves badly with those options turned on > (TWCS is fine with it). The last anti-entropy being a full repair that you > might already not be running as you only do inserts... > > > > Also instead of major compactions (which comes with its set of issues / > tradeoffs too) you can think of a script smartly using sstablemetadata to > find the sstables holding too much tombstones and running single SSTable > compactions on them through JMX and user defined compactions. Meanwhile if > you want to do it manually, you could do it with something like this to > know the tombstone ratio from the biggest sstable: > > du -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $1}" && du > -sh /path_to_a_table/* | sort -h | tail -20 | awk "{print $2}" | xargs > sstablemetadata | grep tombstones > > And something like this to run a user defined compaction on the ones you > chose (big sstable with high tombstone ratio): > > echo "run -b org.apache.cassandra.db:type=CompactionManager > forceUserDefinedCompaction <Data_db_file_name_without_path>" | java -jar > jmxterm-version.jar -l <ip>:<jmx_port> > > *note:* you have to download jmxterm (or use any other jmx tool). > > > > Did you give a try to the unchecked_tombstone_compaction as well > (compaction options at the table level)? Feel free to set this one to true. > I think it could be the default. It is safe as long as your machines have > some more resources available (not that much). That's the first thing I > would do. > > > > Also if you use TTL only, feel free to reduce the gc_grace_seconds, this > will probably help having tombstones removed. I would start with other > solutions first. Keep in mind that if someday you perform deletes, this > setting could produce you some Zombies (data coming back), if you don't run > repair in the gc_grace_seconds for the entire ring. > > C*heers, > > ----------------------- > > Alain Rodriguez - al...@thelastpickle.com > > France > > > > The Last Pickle - Apache Cassandra Consulting > > http://www.thelastpickle.com > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=YH_8oul7dFVkpBLW_2oTDIMju6au0aZNERq2is-d7Ug&s=7arVRTINYZivmy46OVP376O-ZUbNV6Z5uUs1ROprAD4&e=> > > > > 2016-07-07 19:25 GMT+02:00 Jeff Jirsa <jeff.ji...@crowdstrike.com>: > > 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow > over time, but ideally data will expire as it nears your 90 day TTL and > those tables should start dropping away as they age. > > > > 3.0.7 introduces an alternative to DTCS you may find easier to use called > TWCS. It will almost certainly help address the growing sstable count. > > > > > > > > *From: *Riccardo Ferrari <ferra...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Thursday, July 7, 2016 at 6:49 AM > *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Subject: *DTCS SSTable count issue > > > > Hi everyone, > > > > This is my first question, apologize may I do something wrong. > > > > I have a small Cassandra cluster build upon 3 nodes. Originally born as > 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 > recently 3.0.6. Ubuntu is the OS. > > > > There are few tables that have DateTieredCompactionStrategy and are > suffering of constantly growing SSTable count. I have the feeling this has > something to do with the upgrade however I need some hint on how to debug > this issue. > > > > Tables are created like: > > CREATE TABLE <table> ( > > ... > > PRIMARY KEY (...) > > ) WITH CLUSTERING ORDER BY (...) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > > AND comment = '' > > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 7776000 > > AND gc_grace_seconds = 864000 > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99PERCENTILE'; > > > > and this is the "nodetool cfstats" output for that table: > > Read Count: 39 > > Read Latency: 85.03307692307692 ms. > > Write Count: 9845275 > > Write Latency: 0.09604882382665797 ms. > > Pending Flushes: 0 > > Table: <table> > > SSTable count: 48 > > Space used (live): 19566109394 > > Space used (total): 19566109394 > > Space used by snapshots (total): 109796505570 > > Off heap memory used (total): 11317941 > > SSTable Compression Ratio: 0.22632301701483284 > > Number of keys (estimate): 2557 > > Memtable cell count: 0 > > Memtable data size: 0 > > Memtable off heap memory used: 0 > > Memtable switch count: 828 > > Local read count: 39 > > Local read latency: 93.051 ms > > Local write count: 9845275 > > Local write latency: 0.106 ms > > Pending flushes: 0 > > Bloom filter false positives: 2 > > Bloom filter false ratio: 0.00000 > > Bloom filter space used: 10200 > > Bloom filter off heap memory used: 9816 > > Index summary off heap memory used: 4677 > > Compression metadata off heap memory used: 11303448 > > Compacted partition minimum bytes: 150 > > Compacted partition maximum bytes: 4139110981 > > Compacted partition mean bytes: 13463937 > > Average live cells per slice (last five minutes): 59.69230769230769 > > Maximum live cells per slice (last five minutes): 149 > > Average tombstones per slice (last five minutes): 8.564102564102564 > > Maximum tombstones per slice (last five minutes): 42 > > > > According to the "nodetool compactionhistory <keyspace>.<table>" > > the oldest timestamp is "Thu, 30 Jun 2016 13:14:23 GMT" > > and the most recent one is "Thu, 07 Jul 2016 12:15:50 GMT" (THAT IS TODAY) > > > > However the table count is still very high compared to tables that have a > different compaction strategy. If I run a "nodetool compact <table>" the > SSTable count decrease dramatically to a reasonable number. > > I read many articles including: > http://www.datastax.com/dev/blog/datetieredcompactionstrategy > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_datetieredcompactionstrategy&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=35ADGtvp3nLmSgTuemeQ5e3RIubiM_mbcWLyBbv6DEo&s=_1xjcAR70HQlYtx4geGugprQxrSNw2EaiSjeSWm2CJ4&e=> > however I can not really tell if this is an expected behavior. > > What concerns me is that I have an high tombstone read count despite those > are insert only tables. Compacting the table make the tombstone issue > disappear. Yes, we are using TTL to expire data after 3 months and I have > not touch the GC grace period. > > Looking at the file system I see the very first *-Data.db file that is > 15GB then there are all the other 43 *-Data.db files that are ranging from > 50 to 150MB in size. > > > > How can I debug this mis-compaction issue? Any help is much appreciated > > Best, > > >