[jira] [Comment Edited] (CASSANDRA-21184) Partition level tombstone for huge partition removed before gc_grace_seconds

Dmitry Konstantinov (Jira) Mon, 23 Feb 2026 04:58:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-21184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060331#comment-18060331
 ]


Dmitry Konstantinov edited comment on CASSANDRA-21184 at 2/23/26 12:57 PM:
---------------------------------------------------------------------------

A drop in partition size is normal. You have live partition data (many rows 
with data and older timestamps) and a partition-level tombstone with a newer 
timestamp. During compaction, if both are in the SSTables selected for 
compaction, they are merged. During the merge, records with newer timestamps 
win, so the partition tombstone remains and the live rows created before it are 
dropped.

The gc_grace_seconds parameter does not specify how long to store old live 
data. It controls how long to keep a tombstone (to prevent zombie rows from 
reappearing once a tombstone is removed). Merging of tombstones with alive data 
can happen at any time.


was (Author: dnk):
A drop in partition size is normal. You have live partition data (many rows 
with data and older timestamps) and a partition-level tombstone with a newer 
timestamp. During compaction, if both are in the SSTables selected for 
compaction, they are merged. During the merge, records with newer timestamps 
win, so the partition tombstone remains and the live rows created before it are 
dropped.

The gc_grace_seconds parameter does not specify how long to store live data. It 
controls how long to keep a tombstone (to prevent zombie rows from reappearing 
once a tombstone is removed). Merging of tombstones with alive data can happen 
at any time.

> Partition level tombstone for huge partition removed before gc_grace_seconds
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21184
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21184
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Chris Miller
>            Priority: Normal
>
> Hi, 
> Cassandra version 4.1.2.
> We had a huge partition in our Cassandra production cluster (200GiB+) which 
> was caused by an application bug. We were able to complete a partition level 
> deletion for this partition and were expecting tombstones to be deleted post 
> gc_grace_seconds but it happened after the next compaction. Just wondering if 
> this is a bug?
> I have restored a snapshot of the offending CF into our lab environment and 
> completed above steps but unable to reproduce this behavior.
> Let me know if you'd like me to complete any activity in the lab.
> Here's the associated extract from the system log.
> {code:java}
> // INFO  [CompactionExecutor:35] 2026-02-20 22:33:28,790 
> CompactionTask.java:253 - Compacted (e217efd0-0e53-11f1-a4ea-e76abc700e0d) 18 
> sstables to 
> [/data/metadata/data/xxx/yyy-706641f0258211ee9b3439b0035b7956/nb-11350-big,] 
> to level=0.  202.139GiB to 59.557GiB (~29% of original) in 37,922,437ms.  
> Read Throughput = 5.458MiB/s, Write Throughput = 1.608MiB/s, Row Throughput = 
> ~25,477/s.  879,721 total partitions merged to 94,509.  Partition merge 
> counts were {1:3216, 2:2046, 3:2286, 4:2775, 5:3389, 6:4202, 7:5580, 8:7381, 
> 9:9231, 10:11287, 11:13971, 12:15515, 13:12656, 14:983, 15:6, }. Time spent 
> writing keys = 37,921,935ms {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-21184) Partition level tombstone for huge partition removed before gc_grace_seconds

Reply via email to