[
https://issues.apache.org/jira/browse/CASSANDRA-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tobias Lindaaker updated CASSANDRA-18507:
-----------------------------------------
Description:
If there isn't enough disk space available to compact all existing sstables,
Cassandra will attempt to perform a partial compaction by removing sstables
from the set of candidate sstables to be compacted, starting with the largest
one. It is possible that the sstable removed from the set of sstables to
compact contains data for which there are tombstones in another (more recent)
sstable. Since the overlaps between sstables is computed when the
{{CompactionController}} is created, and the {{CompactionController}} is
created before the removal of any sstables from the set of sstables to be
compacted this computed overlap will be outdated when checking which sstables
are covered by certain tombstones. This leads to the faulty conclusion that the
tombstones can be pruned during the compaction, causing the data to be
resurrected.
The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the
{{CompactionController}} after the set of sstables to compact has been reduced,
and is thus not affected. {{trunk}} does not appear to support partial
compactions at all, but instead refuses to compact when the disk is full.
was:If there isn't enough disk space available to compact all existing
sstables, Cassandra will attempt to perform a partial compaction by removing
sstables from the set of candidate sstables to be compacted, starting with the
largest one. It is possible that the sstable removed from the set of sstables
to compact contains data for which there are tombstones in another (more
recent) sstable. Since the overlaps between sstables is computed when the
{{CompactionController}} is created, and the {{CompactionController}} is
created before the removal of any sstables from the set of sstables to be
compacted this computed overlap will be outdated when checking which sstables
are covered by certain tombstones. This leads to the faulty conclusion that the
tombstones can be pruned during the compaction, causing the data to be
resurrected.
> Partial compaction can resurrect deleted data
> ---------------------------------------------
>
> Key: CASSANDRA-18507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18507
> Project: Cassandra
> Issue Type: Bug
> Reporter: Tobias Lindaaker
> Priority: Normal
>
> If there isn't enough disk space available to compact all existing sstables,
> Cassandra will attempt to perform a partial compaction by removing sstables
> from the set of candidate sstables to be compacted, starting with the largest
> one. It is possible that the sstable removed from the set of sstables to
> compact contains data for which there are tombstones in another (more recent)
> sstable. Since the overlaps between sstables is computed when the
> {{CompactionController}} is created, and the {{CompactionController}} is
> created before the removal of any sstables from the set of sstables to be
> compacted this computed overlap will be outdated when checking which sstables
> are covered by certain tombstones. This leads to the faulty conclusion that
> the tombstones can be pruned during the compaction, causing the data to be
> resurrected.
> The issue is present in Cassandra 4.0 and 4.1. Cassandra 3.11 creates the
> {{CompactionController}} after the set of sstables to compact has been
> reduced, and is thus not affected. {{trunk}} does not appear to support
> partial compactions at all, but instead refuses to compact when the disk is
> full.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]