[
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578794#comment-14578794
]
Tupshin Harper commented on CASSANDRA-7066:
-------------------------------------------
Users don't care about SSTables, users care about their data. It's unclear
what, if any, impact this would have on the availability/existence of data. So
a few questions about failure conditions, all of which would apply to a single
node cluster, and with commitlog durability set to batch, for simplicity of
discussion.
Could this result in any circumstances where:
# a write was acknowledged to be written (consistency level met), but then no
longer exists on disk through this sstable cleanup/deletion?
# a datum was queryable (through memtable or sstable read), but then is either
no longer on disk or queryable?
# a datum was deleted (tombstone?) and then comes back?
# similar questions to above when a snapshot/backup occurred prior to the
sstable cleanup, and restoration from that backup was necessary.
If the answer to all of those is "no", then I have a hard time imagining any
objections, though would love additional input from others. If yes, then huge
problem. :)
Given the reference to "partial results" above, I'd also like some clarity on
whether that has had any user-facing impact of data availability/queryability.
> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
> Key: CASSANDRA-7066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Priority: Minor
> Labels: compaction
> Fix For: 3.x
>
> Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table,
> which we use to cleanup incomplete compactions when we're done. The problem
> with this is that 1) it's a bit clunky (and leaves us in positions where we
> can unnecessarily cleanup completed files, or conversely not cleanup files
> that have been superceded); and 2) it's only used for a regular compaction -
> no other compaction types are guarded in the same way, so can result in
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and
> on startup we simply delete any sstables that occur in the union of all
> ancestor sets. This way as soon as we finish writing we're capable of
> cleaning up any leftovers, so we never get duplication. It's also much easier
> to reason about.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)