[
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578747#comment-14578747
]
Benedict commented on CASSANDRA-7066:
-------------------------------------
I've pushed a few suggestions
[here|https://github.com/belliottsmith/cassandra/tree/7066-suggestions]. Mostly
just simplifying the creation and running of SSTableDeletingTask following on
from some of your cleanup.
However, I do wonder if we shouldn't try and eliminate it altogether. We could,
for instance, move the deletion work all into the TransactionLogs class,
wherein a Set of sstables we are obsoleting is maintained; on releasing a
reader we ask the TxnLogs to delete us from the disk and from the set (instead
of releasing a ref). If any of the deletions fail, then once all files have
been deleted the whole TxnLog can be enqueued to a "retry" queue (instead of
SSTableDeletingTask). This would just repeatedly try the whole log cleanup,
just as we do on startup. This seems like it would lead to an easier to
understand lifecycle, and would also manage more clearly the problem you
mention above of competing deletes (since it would be managed by the same
TxnLog, ultimately). We can also rid ourselves of SSTableDeletingTask entirely,
which I think would be nice.
Either way, the current logic looks like it could lead to files not being
deleted that should be, since we release our reference to the TxnLog as soon as
we first try to delete, but the delete can fail, and be retried later (so if we
crash before successful delete, we may startup thinking this file is to keep).
A few other queries:
* Why does SSTableWriter.finish() take a TxnLog? For convenience/clarity?
Perhaps we should have a wrapped SSTableWriter that encapsulates this, so we
don't confuse matters when we have another encapsulation (Rewriter, etc)?
* I wonder if it might be more robust to only log the descriptors (except the
opposing log file, which would be a slight special case, but acceptable since
it would be logged first, so easily disambiguated). Just seems like we may be
less likely to have some mistake that leaves components lying around that we
should have cleaned up.
* On upgrade I guess we should drop the compactions_in_progress table.
[~iamaleksey]: i'm not at all familiar with cross version changes like this.
Could you fill us in on protocol? Presumably this will be affected by downgrade
support, which I don't think it planned anytime soon.
* Are we going to upset users by deleting some of their sstables when their CQL
sstable writer operation fails? Previously any written before close() completed
would continue to exist, whereas now if there is a failure prior to close they
will be deleted. So each CQLSSTableWriter creation effectively functions as a
transaction, whereas before it would permit partial results. Any lurkers got
any opinions? [~jshook] [~jeromatron] [~tupshin]?
I'm really looking forward to this change going in. It cleans up quite a lot in
one fell swoop (tmp, tmplink, compaction leftovers, sstable deletion...)
> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
> Key: CASSANDRA-7066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Priority: Minor
> Labels: compaction
> Fix For: 3.x
>
> Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table,
> which we use to cleanup incomplete compactions when we're done. The problem
> with this is that 1) it's a bit clunky (and leaves us in positions where we
> can unnecessarily cleanup completed files, or conversely not cleanup files
> that have been superceded); and 2) it's only used for a regular compaction -
> no other compaction types are guarded in the same way, so can result in
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and
> on startup we simply delete any sstables that occur in the union of all
> ancestor sets. This way as soon as we finish writing we're capable of
> cleaning up any leftovers, so we never get duplication. It's also much easier
> to reason about.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)