Re: Tombstones not getting purged

Alexander Dejanovski Thu, 20 Jun 2019 00:50:21 -0700

Léo,

if a major compaction isn't a viable option, you can give a go at
Instaclustr SSTables tools to target the partitions with the most
tombstones :
https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge


It generates a report like this:

Summary:

+---------+---------+

|         | Size    |

+---------+---------+

| Disk    |  1.9 GB |

| Reclaim | 11.7 MB |

+---------+---------+


Largest reclaimable partitions:

+--------------+--------+---------+-----------------+

| Key          | Size   | Reclaim | Generations     |

+--------------+--------+---------+-----------------+

| 001.2.340862 | 3.2 kB |  3.2 kB | [534, 438, 498] |

| 001.2.946243 | 2.9 kB |  2.8 kB | [534, 434, 384] |

| 001.1.527557 | 2.8 kB |  2.7 kB | [534, 519, 394] |

| 001.2.181797 | 2.6 kB |  2.6 kB | [534, 424, 343] |

| 001.3.475853 | 2.7 kB |    28 B |      [524, 462] |

| 001.0.159704 | 2.7 kB |    28 B |      [440, 247] |

| 001.1.311372 | 2.6 kB |    28 B |      [424, 458] |

| 001.0.756293 | 2.6 kB |    28 B |      [428, 358] |

| 001.2.681009 | 2.5 kB |    28 B |      [440, 241] |

| 001.2.474773 | 2.5 kB |    28 B |      [524, 484] |

| 001.2.974571 | 2.5 kB |    28 B |      [386, 517] |

| 001.0.143176 | 2.5 kB |    28 B |      [518, 368] |

| 001.1.185198 | 2.5 kB |    28 B |      [517, 386] |

| 001.3.503517 | 2.5 kB |    28 B |      [426, 346] |

| 001.1.847384 | 2.5 kB |    28 B |      [436, 396] |

| 001.0.949269 | 2.5 kB |    28 B |      [516, 356] |

| 001.0.756763 | 2.5 kB |    28 B |      [440, 249] |

| 001.3.973808 | 2.5 kB |    28 B |      [517, 386] |

| 001.0.312718 | 2.4 kB |    28 B |      [524, 467] |

| 001.3.632066 | 2.4 kB |    28 B |      [432, 377] |

| 001.1.946590 | 2.4 kB |    28 B |      [519, 389] |

| 001.1.798591 | 2.4 kB |    28 B |      [434, 388] |

| 001.3.953922 | 2.4 kB |    28 B |      [432, 375] |

| 001.2.585518 | 2.4 kB |    28 B |      [432, 375] |

| 001.3.284942 | 2.4 kB |    28 B |      [376, 432] |

+--------------+--------+---------+-----------------+

Once you've identified these partitions you can run a compaction on the
SSTables that contain them (identified using "nodetool getsstables").
Note that user defined compactions are only available for STCS.
Also ic-purge will perform a compaction but without writing to disk (should
look like a validation compaction), so it is rightfully reported by the
docs as an "intensive process" (not more than a repair though).

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski <a...@thelastpickle.com>
wrote:

> My bad on date formatting, it should have been : %Y/%m/%d
> Otherwise the SSTables aren't ordered properly.
>
> You have 2 SSTables that claim to cover timestamps from 1940 to 2262,
> which is weird.
> Aside from that, you have big overlaps all over the SSTables, so that's
> probably why your tombstones are sticking around.
>
> Your best shot here will be a major compaction of that table, since it
> doesn't seem so big. Remember to use the --split-output flag on the
> compaction command to avoid ending up with a single SSTable after that.
>
> Cheers,
>
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
> <lfer...@mailjet.com.invalid> wrote:
>
>> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi Leo,
>>>
>>> The overlapping SSTables are indeed the most probable cause as suggested
>>> by Jeff.
>>> Do you know if the tombstone compactions actually triggered? (did the
>>> SSTables name change?)
>>>
>>
>> Hello !
>>
>> I believe they have changed. I do not remember the sstable name but the
>> "last modified" has changed recently for these tables.
>>
>>
>>> Could you run the following command to list SSTables and provide us the
>>> output? It will display both their timestamp ranges along with the
>>> estimated droppable tombstones ratio.
>>>
>>>
>>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
>>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
>>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
>>> grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S')
>>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort
>>>
>>
>> Here is the results :
>>
>> ```
>> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
>> 05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db
>> 05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db
>> 05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones:
>> 0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55
>> md-167943-big-Data.db
>> 05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones:
>> 0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35
>> md-167913-big-Data.db
>> 06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones:
>> 0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51
>> md-167917-big-Data.db
>> 06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones:
>> 0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29
>> md-167938-big-Data.db
>> 06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones:
>> 0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52
>> md-167940-big-Data.db
>> 06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db
>> 06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db
>> 06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db
>> 06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db
>> 07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db
>> 09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db
>> 11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db
>> 12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db
>> ```
>>
>> You could also check the min and max tokens in each SSTable (not sure if
>>> you get that info from 3.0 sstablemetadata) so that you can detect the
>>> SSTables that overlap on token ranges with the ones that carry the
>>> tombstones, and have earlier timestamps. This way you'll be able to trigger
>>> manual compactions, targeting those specific SSTables.
>>>
>>
>> I have checked and I don't believe the info is available in the 3.0.X
>> version of sstablemetadata :(
>>
>>
>>> The rule for a tombstone to be purged is that there is no SSTable
>>> outside the compaction that would possibly contain the partition and that
>>> would have older timestamps.
>>>
>>  Is there a way to log these checks and decisions made by the compaction
>> thread ?
>>
>>
>>> Is this a followup on your previous issue where you were trying to
>>> perform a major compaction on an LCS table?
>>>
>>
>> In some way.
>>
>> We are trying to globally reclaim the data used up by our tombstones (on
>> more than one table). We have recently started to purge old data in our
>> cassandra cluster, and since (on cloud providers) `Disk space isn't cheap`
>> we are trying to be sure the data correctly expires and the disk space is
>> reclaimed !
>>
>> The major compaction on the LCS table was one of our unsuccessful
>> attempts (too long and too much disk space used, so abandoned), and we are
>> currently trying to tweak the compaction parameters to speed things up.
>>
>> Regards.
>>
>> Leo
>>
>> On Thu, Jun 20, 2019 at 7:02 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> Probably overlapping sstables
>>>>
>>>> Which compaction strategy?
>>>>
>>>>
>>>> > On Jun 19, 2019, at 9:51 PM, Léo FERLIN SUTTON
>>>> <lfer...@mailjet.com.invalid> wrote:
>>>> >
>>>> > I have used the following command to check if I had droppable
>>>> tombstones :
>>>> > `/usr/bin/sstablemetadata --gc_grace_seconds 259200
>>>> /var/lib/cassandra/data/stats/tablename/md-sstablename-big-Data.db`
>>>> >
>>>> > I checked every sstable in a loop and had 4 sstables with droppable
>>>> tombstones :
>>>> >
>>>> > ```
>>>> > Estimated droppable tombstones: 0.1558453651124074
>>>> > Estimated droppable tombstones: 0.20980847354256815
>>>> > Estimated droppable tombstones: 0.30826566640798975
>>>> > Estimated droppable tombstones: 0.45150604672159905
>>>> > ```
>>>> >
>>>> > I changed my compaction configuration this morning (via JMX) to force
>>>> a tombstone compaction. These are my settings on this node :
>>>> >
>>>> > ```
>>>> > {
>>>> > "max_threshold":"32",
>>>> > "min_threshold":"4",
>>>> > "unchecked_tombstone_compaction":"true",
>>>> > "tombstone_threshold":"0.1",
>>>> >
>>>> "class":"org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy"
>>>> > }
>>>> > ```
>>>> > The threshold is lowed than the amount of tombstones in these
>>>> sstables and I expected the setting `unchecked_tombstone_compaction=True`
>>>> would force cassandra to run a "Tombstone Compaction", yet about 24h later
>>>> all the tombstones are still there.
>>>> >
>>>> > ## About the cluster :
>>>> >
>>>> > The compaction backlog is clear and here are our cassandra settings :
>>>> >
>>>> > Cassandra 3.0.18
>>>> > concurrent_compactors: 4
>>>> > compaction_throughput_mb_per_sec: 150
>>>> > sstable_preemptive_open_interval_in_mb: 50
>>>> > memtable_flush_writers: 4
>>>> >
>>>> >
>>>> > Any idea what I might be missing ?
>>>> >
>>>> > Regards,
>>>> >
>>>> > Leo
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>
>>>>

Re: Tombstones not getting purged

Reply via email to