[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648950#comment-14648950
 ] 

Stefania commented on CASSANDRA-7066:
-------------------------------------

[~benedict], ready for a first round of review.

* Incremental CRC checks and single log file were already available.

* I've added logging of the latest update time and a checksum on update times 
and sizes for all files of an old descriptor. These are calculated when an 
sstable is obsoleted. If they do not match when we are about to delete the 
files, then we skip this record files. The checksum is somewhat redundant since 
it is difficult to change file content without changing the update time, so it 
can be removed if you prefer.

* I've renamed {{sstablelister}} to {{sstableutil}} and added an option to 
cleanup any outstanding transactions ({{sstableutil -c ks table}} will perform 
the same tasks as we do on startup). If you really want a tool that only does 
this, i.e. something like {{sstablecleanup}} then again, let me know now and it 
can be changed easily.

* I've removed the ancestors from the compression metadata.

* I've also updated the dtests for sstableutil in [this 
commit|https://github.com/stef1927/cassandra-dtest/commit/6076cfd9c32d463ac245eed6d34e9b7921a0a7cf],
 I will create a pull request once we have finalized the tool semantics.

> Simplify (and unify) cleanup of compaction leftovers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>            Priority: Minor
>              Labels: benedict-to-commit, compaction
>             Fix For: 3.0 alpha 1
>
>         Attachments: 7066.txt
>
>
> Currently we manage a list of in-progress compactions in a system table, 
> which we use to cleanup incomplete compactions when we're done. The problem 
> with this is that 1) it's a bit clunky (and leaves us in positions where we 
> can unnecessarily cleanup completed files, or conversely not cleanup files 
> that have been superceded); and 2) it's only used for a regular compaction - 
> no other compaction types are guarded in the same way, so can result in 
> duplication if we fail before deleting the replacements.
> I'd like to see each sstable store in its metadata its direct ancestors, and 
> on startup we simply delete any sstables that occur in the union of all 
> ancestor sets. This way as soon as we finish writing we're capable of 
> cleaning up any leftovers, so we never get duplication. It's also much easier 
> to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to