[
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645489#comment-14645489
]
Stefania commented on CASSANDRA-7066:
-------------------------------------
[~benedict] first basic single log file version is available on [this
branch|https://github.com/stef1927/cassandra/tree/7066-b]. I wait to hear from
you regarding adding CRCs and update times.
Here is the write-up that I've added to NEWS.txt:
{quote}
New transaction log files have been introduced to replace the
compactions_in_progress
system table. They control the sstable files involved in compactions and
other operations
such as flushing and streaming. Use the sstablelister tool to list any
sstable files
currently involved in operations not yet completed, which we define as
temporary files.
A transaction log file contains one sstable per line, with the prefix
"add:" or "remove:".
They also contain a final special line "commit", only inserted when the
transaction is committed.
On startup we use these files to cleanup any partial transactions that
were in progress
when the process exited. If the commit line is found, we keep new "add"
prefix sstables and
delete the old "remove" prefix sstables, vice-versa if the commit line is
missing.
Should you loose or delete these log files, both old and new sstable files
will be kept
as live files, which will result in duplicated sstables. Should you
manually edit these
files and remove or add the commit line for example, then this would
change which sstable
files are retained on startup. See CASSANDRA-7066 for full details.
{quote}
> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
> Key: CASSANDRA-7066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Stefania
> Priority: Minor
> Labels: benedict-to-commit, compaction
> Fix For: 3.0 alpha 1
>
> Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table,
> which we use to cleanup incomplete compactions when we're done. The problem
> with this is that 1) it's a bit clunky (and leaves us in positions where we
> can unnecessarily cleanup completed files, or conversely not cleanup files
> that have been superceded); and 2) it's only used for a regular compaction -
> no other compaction types are guarded in the same way, so can result in
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and
> on startup we simply delete any sstables that occur in the union of all
> ancestor sets. This way as soon as we finish writing we're capable of
> cleaning up any leftovers, so we never get duplication. It's also much easier
> to reason about.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)