[
https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151348#comment-13151348
]
Sylvain Lebresne commented on CASSANDRA-3406:
---------------------------------------------
Parts of the patch is just refactoring slightly CompactionManager to avoid some
existing code duplication between performCleanup and performScrub. I don't
claim such refactoring is a priority or anything, but I would venture that this
is a good thing in itself. Once that refactoring is done, the new operation is
literally 4 lines. Then there is the cruft to make it callable from nodetool,
but overall it doesn't sound like much complexity to me.
Now for the benefits, it is clearly *not* for saving i/o or CPU. The goal is:
* to avoid having an operation called 'scrub' part of the normal upgrade path
because it's a scary name. Yes, it's just a naming thing (but names are
important) and yes nobody came complaining about that name but let's be honest,
scrub was not created for the action of rewriting sstables post-upgrade and the
name is not adapted.
* scrub does an automatic snapshot. It's totally reasonable for scrub initial
purpose given the fact it can discard data (albeit corrupted ones), but it's
just annoying when you've already snapshotted (and maybe move the snapshot in
some safe place) everything just before your upgrade because you're a good guy.
* scrub can discard data. I think this is something that should never go
unnoticed. By pushing the use of scrub for case where there is absolutely no
reason to suspect corruption, it makes it more likely to have it be unnoticed,
at least at first.
So yes, all of this is mostly details, and sorry to be so verbose for such a
minor issue but I happen to think that such details are important and that this
ticket would be an improvement.
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't
> intended to.
> -----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-3406
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Trivial
> Fix For: 1.0.4
>
> Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to
> correct some form of corruption and discard the data when it can't repair.
> But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake
> (that one could be "fixed" independently by supporting old format during
> streaming)
> * to force the compaction of existing sstables after changing the compression
> algorithm
> We should probably provide a separate tool/command for those two last tasks
> since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for
> that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected
> by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira