[
https://issues.apache.org/jira/browse/CASSANDRA-21184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060331#comment-18060331
]
Dmitry Konstantinov edited comment on CASSANDRA-21184 at 2/23/26 12:57 PM:
---------------------------------------------------------------------------
A drop in partition size is normal. You have live partition data (many rows
with data and older timestamps) and a partition-level tombstone with a newer
timestamp. During compaction, if both are in the SSTables selected for
compaction, they are merged. During the merge, records with newer timestamps
win, so the partition tombstone remains and the live rows created before it are
dropped.
The gc_grace_seconds parameter does not specify how long to store old live
data. It controls how long to keep a tombstone (to prevent zombie rows from
reappearing once a tombstone is removed). Merging of tombstones with alive data
can happen at any time.
was (Author: dnk):
A drop in partition size is normal. You have live partition data (many rows
with data and older timestamps) and a partition-level tombstone with a newer
timestamp. During compaction, if both are in the SSTables selected for
compaction, they are merged. During the merge, records with newer timestamps
win, so the partition tombstone remains and the live rows created before it are
dropped.
The gc_grace_seconds parameter does not specify how long to store live data. It
controls how long to keep a tombstone (to prevent zombie rows from reappearing
once a tombstone is removed). Merging of tombstones with alive data can happen
at any time.
> Partition level tombstone for huge partition removed before gc_grace_seconds
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-21184
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21184
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Chris Miller
> Priority: Normal
>
> Hi,
> Cassandra version 4.1.2.
> We had a huge partition in our Cassandra production cluster (200GiB+) which
> was caused by an application bug. We were able to complete a partition level
> deletion for this partition and were expecting tombstones to be deleted post
> gc_grace_seconds but it happened after the next compaction. Just wondering if
> this is a bug?
> I have restored a snapshot of the offending CF into our lab environment and
> completed above steps but unable to reproduce this behavior.
> Let me know if you'd like me to complete any activity in the lab.
> Here's the associated extract from the system log.
> {code:java}
> // INFO [CompactionExecutor:35] 2026-02-20 22:33:28,790
> CompactionTask.java:253 - Compacted (e217efd0-0e53-11f1-a4ea-e76abc700e0d) 18
> sstables to
> [/data/metadata/data/xxx/yyy-706641f0258211ee9b3439b0035b7956/nb-11350-big,]
> to level=0. 202.139GiB to 59.557GiB (~29% of original) in 37,922,437ms.
> Read Throughput = 5.458MiB/s, Write Throughput = 1.608MiB/s, Row Throughput =
> ~25,477/s. 879,721 total partitions merged to 94,509. Partition merge
> counts were {1:3216, 2:2046, 3:2286, 4:2775, 5:3389, 6:4202, 7:5580, 8:7381,
> 9:9231, 10:11287, 11:13971, 12:15515, 13:12656, 14:983, 15:6, }. Time spent
> writing keys = 37,921,935ms {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]