[
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394673#comment-14394673
]
Benedict commented on CASSANDRA-7066:
-------------------------------------
Even better. It hadn't occurred to me the current code was all due to the lack
of idempotency; I assumed there was just concern about leaving a large amount
of data around. There _is_ still the risk that this could be a prohibitive
danger on some systems (say, you have a multi-Tb file that's just been
compacted). So to offer one further alternative that is perhaps only slightly
more complicated and retains the safety:
* create two logs files: A and B; both log _each other_; file A also logs the
new file(s) as they're created; file B also logs the old file(s)
* once done delete file A; then delete the old files; then delete file B
* if we find file A we delete its contents (including file B); If we find file
B only, we delete its contents
> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
> Key: CASSANDRA-7066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Priority: Minor
> Labels: compaction
> Fix For: 3.0
>
>
> Currently we manage a list of in-progress compactions in a system table,
> which we use to cleanup incomplete compactions when we're done. The problem
> with this is that 1) it's a bit clunky (and leaves us in positions where we
> can unnecessarily cleanup completed files, or conversely not cleanup files
> that have been superceded); and 2) it's only used for a regular compaction -
> no other compaction types are guarded in the same way, so can result in
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and
> on startup we simply delete any sstables that occur in the union of all
> ancestor sets. This way as soon as we finish writing we're capable of
> cleaning up any leftovers, so we never get duplication. It's also much easier
> to reason about.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)