Léo, if a major compaction isn't a viable option, you can give a go at Instaclustr SSTables tools to target the partitions with the most tombstones : https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge
It generates a report like this: Summary: +---------+---------+ | | Size | +---------+---------+ | Disk | 1.9 GB | | Reclaim | 11.7 MB | +---------+---------+ Largest reclaimable partitions: +--------------+--------+---------+-----------------+ | Key | Size | Reclaim | Generations | +--------------+--------+---------+-----------------+ | 001.2.340862 | 3.2 kB | 3.2 kB | [534, 438, 498] | | 001.2.946243 | 2.9 kB | 2.8 kB | [534, 434, 384] | | 001.1.527557 | 2.8 kB | 2.7 kB | [534, 519, 394] | | 001.2.181797 | 2.6 kB | 2.6 kB | [534, 424, 343] | | 001.3.475853 | 2.7 kB | 28 B | [524, 462] | | 001.0.159704 | 2.7 kB | 28 B | [440, 247] | | 001.1.311372 | 2.6 kB | 28 B | [424, 458] | | 001.0.756293 | 2.6 kB | 28 B | [428, 358] | | 001.2.681009 | 2.5 kB | 28 B | [440, 241] | | 001.2.474773 | 2.5 kB | 28 B | [524, 484] | | 001.2.974571 | 2.5 kB | 28 B | [386, 517] | | 001.0.143176 | 2.5 kB | 28 B | [518, 368] | | 001.1.185198 | 2.5 kB | 28 B | [517, 386] | | 001.3.503517 | 2.5 kB | 28 B | [426, 346] | | 001.1.847384 | 2.5 kB | 28 B | [436, 396] | | 001.0.949269 | 2.5 kB | 28 B | [516, 356] | | 001.0.756763 | 2.5 kB | 28 B | [440, 249] | | 001.3.973808 | 2.5 kB | 28 B | [517, 386] | | 001.0.312718 | 2.4 kB | 28 B | [524, 467] | | 001.3.632066 | 2.4 kB | 28 B | [432, 377] | | 001.1.946590 | 2.4 kB | 28 B | [519, 389] | | 001.1.798591 | 2.4 kB | 28 B | [434, 388] | | 001.3.953922 | 2.4 kB | 28 B | [432, 375] | | 001.2.585518 | 2.4 kB | 28 B | [432, 375] | | 001.3.284942 | 2.4 kB | 28 B | [376, 432] | +--------------+--------+---------+-----------------+ Once you've identified these partitions you can run a compaction on the SSTables that contain them (identified using "nodetool getsstables"). Note that user defined compactions are only available for STCS. Also ic-purge will perform a compaction but without writing to disk (should look like a validation compaction), so it is rightfully reported by the docs as an "intensive process" (not more than a repair though). ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski <a...@thelastpickle.com> wrote: > My bad on date formatting, it should have been : %Y/%m/%d > Otherwise the SSTables aren't ordered properly. > > You have 2 SSTables that claim to cover timestamps from 1940 to 2262, > which is weird. > Aside from that, you have big overlaps all over the SSTables, so that's > probably why your tombstones are sticking around. > > Your best shot here will be a major compaction of that table, since it > doesn't seem so big. Remember to use the --split-output flag on the > compaction command to avoid ending up with a single SSTable after that. > > Cheers, > > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON > <lfer...@mailjet.com.invalid> wrote: > >> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski < >> a...@thelastpickle.com> wrote: >> >>> Hi Leo, >>> >>> The overlapping SSTables are indeed the most probable cause as suggested >>> by Jeff. >>> Do you know if the tombstone compactions actually triggered? (did the >>> SSTables name change?) >>> >> >> Hello ! >> >> I believe they have changed. I do not remember the sstable name but the >> "last modified" has changed recently for these tables. >> >> >>> Could you run the following command to list SSTables and provide us the >>> output? It will display both their timestamp ranges along with the >>> estimated droppable tombstones ratio. >>> >>> >>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200 >>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" " >>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" | >>> grep Minimum\ time | cut -d" " -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') >>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort >>> >> >> Here is the results : >> >> ``` >> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db >> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db >> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db >> 05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db >> 05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db >> 05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones: >> 0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55 >> md-167943-big-Data.db >> 05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones: >> 0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35 >> md-167913-big-Data.db >> 06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones: >> 0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51 >> md-167917-big-Data.db >> 06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones: >> 0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29 >> md-167938-big-Data.db >> 06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones: >> 0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52 >> md-167940-big-Data.db >> 06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db >> 06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db >> 06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db >> 06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db >> 07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db >> 09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db >> 11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db >> 12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones: >> 0.0 -rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db >> ``` >> >> You could also check the min and max tokens in each SSTable (not sure if >>> you get that info from 3.0 sstablemetadata) so that you can detect the >>> SSTables that overlap on token ranges with the ones that carry the >>> tombstones, and have earlier timestamps. This way you'll be able to trigger >>> manual compactions, targeting those specific SSTables. >>> >> >> I have checked and I don't believe the info is available in the 3.0.X >> version of sstablemetadata :( >> >> >>> The rule for a tombstone to be purged is that there is no SSTable >>> outside the compaction that would possibly contain the partition and that >>> would have older timestamps. >>> >> Is there a way to log these checks and decisions made by the compaction >> thread ? >> >> >>> Is this a followup on your previous issue where you were trying to >>> perform a major compaction on an LCS table? >>> >> >> In some way. >> >> We are trying to globally reclaim the data used up by our tombstones (on >> more than one table). We have recently started to purge old data in our >> cassandra cluster, and since (on cloud providers) `Disk space isn't cheap` >> we are trying to be sure the data correctly expires and the disk space is >> reclaimed ! >> >> The major compaction on the LCS table was one of our unsuccessful >> attempts (too long and too much disk space used, so abandoned), and we are >> currently trying to tweak the compaction parameters to speed things up. >> >> Regards. >> >> Leo >> >> On Thu, Jun 20, 2019 at 7:02 AM Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> Probably overlapping sstables >>>> >>>> Which compaction strategy? >>>> >>>> >>>> > On Jun 19, 2019, at 9:51 PM, Léo FERLIN SUTTON >>>> <lfer...@mailjet.com.invalid> wrote: >>>> > >>>> > I have used the following command to check if I had droppable >>>> tombstones : >>>> > `/usr/bin/sstablemetadata --gc_grace_seconds 259200 >>>> /var/lib/cassandra/data/stats/tablename/md-sstablename-big-Data.db` >>>> > >>>> > I checked every sstable in a loop and had 4 sstables with droppable >>>> tombstones : >>>> > >>>> > ``` >>>> > Estimated droppable tombstones: 0.1558453651124074 >>>> > Estimated droppable tombstones: 0.20980847354256815 >>>> > Estimated droppable tombstones: 0.30826566640798975 >>>> > Estimated droppable tombstones: 0.45150604672159905 >>>> > ``` >>>> > >>>> > I changed my compaction configuration this morning (via JMX) to force >>>> a tombstone compaction. These are my settings on this node : >>>> > >>>> > ``` >>>> > { >>>> > "max_threshold":"32", >>>> > "min_threshold":"4", >>>> > "unchecked_tombstone_compaction":"true", >>>> > "tombstone_threshold":"0.1", >>>> > >>>> "class":"org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy" >>>> > } >>>> > ``` >>>> > The threshold is lowed than the amount of tombstones in these >>>> sstables and I expected the setting `unchecked_tombstone_compaction=True` >>>> would force cassandra to run a "Tombstone Compaction", yet about 24h later >>>> all the tombstones are still there. >>>> > >>>> > ## About the cluster : >>>> > >>>> > The compaction backlog is clear and here are our cassandra settings : >>>> > >>>> > Cassandra 3.0.18 >>>> > concurrent_compactors: 4 >>>> > compaction_throughput_mb_per_sec: 150 >>>> > sstable_preemptive_open_interval_in_mb: 50 >>>> > memtable_flush_writers: 4 >>>> > >>>> > >>>> > Any idea what I might be missing ? >>>> > >>>> > Regards, >>>> > >>>> > Leo >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>> >>>>