[ 
https://issues.apache.org/jira/browse/CASSANDRA-10222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727734#comment-14727734
 ] 

Joshua McKenzie commented on CASSANDRA-10222:
---------------------------------------------

The redundant run call was an artifact of me changing that interface and then 
not actually reading back through it again from that perspective - bad form on 
my part. I've refactored the constructor on SnapshotDeletingTask a bit and 
broke out the task creation to an {{addFailedSnapshot}} method - I think it 
cleans that interface up quite a bit; let me know what you think on that front.

I'm pretty sure all compactions go through the CompactionExecutor; this change 
actually gets us a tiny bit of *over* deletion attempts as 
{{CompactionManager.ValidationExecutor}} and 
{{CompactionManager.CacheCleanupExecutor}} are both going to rely on 
{{CompactionExecutor.afterExecute}}, running the 
{{SnapshotDeletingTask.rescheduleFailedTasks}}, but I think the cost of 
refactoring those classes isn't worth it just to try and eliminate rare 
potential no-op task removal/re-add on a snapshot deletion that's not ready yet.

I've gone ahead and manually set up some CI jobs to run on Windows:
[2.2 
utest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_2.2_utest_win32/]
[2.2 
dtest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_2.2_dtest_win32/]
[3.0 
utest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/10222_3.0_utest_win32/]

As dtest runs on the platform are currently 10+ hours, I've limited us to 2.2 
only at this time. I can create and run a 3.0 job if you're concerned about it, 
however with Windows-specific changes like this (and 3.0 being in beta) I tend 
to be a *little* less stringent on running the full CI gamut than I would 
otherwise be.

> Periodically attempt to delete failed snapshot deletions on Windows
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-10222
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10222
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Joshua McKenzie
>            Assignee: Joshua McKenzie
>              Labels: Windows
>             Fix For: 2.2.2
>
>
> The changes in CASSANDRA-9658 leave us in a position where a node on Windows 
> will have to be restarted to clear out snapshots that cannot be deleted at 
> request time due to sstables still being mapped, thus preventing deletions of 
> hard links. A simple periodic task to categorize failed snapshot deletions 
> and retry them would help prevent node disk utilization from growing 
> unbounded by snapshots as compaction will eventually make these snapshot 
> files deletable.
> Given that hard links to files in NTFS don't take up any extra space on disk 
> so long as the original file still exists, the only limitation for users from 
> this approach will be the inability to 'move' a snapshot file to another 
> drive share. They will be copyable, however, so it's a minor platform 
> difference.
> This goes directly against the goals of CASSANDRA-8271 and will likely be 
> built on top of that code. Until such time as we get buffered performance 
> in-line with memory-mapped, this is an interim necessity for production 
> roll-outs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to