[
https://issues.apache.org/jira/browse/CASSANDRA-10222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727734#comment-14727734
]
Joshua McKenzie commented on CASSANDRA-10222:
---------------------------------------------
The redundant run call was an artifact of me changing that interface and then
not actually reading back through it again from that perspective - bad form on
my part. I've refactored the constructor on SnapshotDeletingTask a bit and
broke out the task creation to an {{addFailedSnapshot}} method - I think it
cleans that interface up quite a bit; let me know what you think on that front.
I'm pretty sure all compactions go through the CompactionExecutor; this change
actually gets us a tiny bit of *over* deletion attempts as
{{CompactionManager.ValidationExecutor}} and
{{CompactionManager.CacheCleanupExecutor}} are both going to rely on
{{CompactionExecutor.afterExecute}}, running the
{{SnapshotDeletingTask.rescheduleFailedTasks}}, but I think the cost of
refactoring those classes isn't worth it just to try and eliminate rare
potential no-op task removal/re-add on a snapshot deletion that's not ready yet.
I've gone ahead and manually set up some CI jobs to run on Windows:
[2.2
utest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_2.2_utest_win32/]
[2.2
dtest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_2.2_dtest_win32/]
[3.0
utest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_3.0_utest_win32/]
As dtest runs on the platform are currently 10+ hours, I've limited us to 2.2
only at this time. I can create and run a 3.0 job if you're concerned about it,
however with Windows-specific changes like this (and 3.0 being in beta) I tend
to be a *little* less stringent on running the full CI gamut than I would
otherwise be.
> Periodically attempt to delete failed snapshot deletions on Windows
> -------------------------------------------------------------------
>
> Key: CASSANDRA-10222
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10222
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Joshua McKenzie
> Assignee: Joshua McKenzie
> Labels: Windows
> Fix For: 2.2.2
>
>
> The changes in CASSANDRA-9658 leave us in a position where a node on Windows
> will have to be restarted to clear out snapshots that cannot be deleted at
> request time due to sstables still being mapped, thus preventing deletions of
> hard links. A simple periodic task to categorize failed snapshot deletions
> and retry them would help prevent node disk utilization from growing
> unbounded by snapshots as compaction will eventually make these snapshot
> files deletable.
> Given that hard links to files in NTFS don't take up any extra space on disk
> so long as the original file still exists, the only limitation for users from
> this approach will be the inability to 'move' a snapshot file to another
> drive share. They will be copyable, however, so it's a minor platform
> difference.
> This goes directly against the goals of CASSANDRA-8271 and will likely be
> built on top of that code. Until such time as we get buffered performance
> in-line with memory-mapped, this is an interim necessity for production
> roll-outs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)