[ 
https://issues.apache.org/jira/browse/CASSANDRA-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605978#comment-13605978
 ] 

Jonathan Ellis edited comment on CASSANDRA-3430 at 3/19/13 3:01 AM:
--------------------------------------------------------------------

You're right, there's a race because I assumed that if getNextBackgroundTask is 
already running on CompactionExecutor, it will be part of the compaction 
activity we wait to finish.  Unfortunately I don't see a good way to actually 
make that true; collector.beginCompaction doesn't run until we're well into the 
task (because we can't create the necessary , and finishCompaction runs before 
we unmark.

So instead I'm using the compaction marker itself as an indication that we've 
successfully cancelled everything.  Which is obviously more correct, but I'd 
already found a couple compaction-marker leaks so I was hoping to make any 
regressions there obvious.  I did the next best thing and added a timed loop 
after which we give up and log.

I've also made pause and getNextBackgroundTask serialized, so we can guarantee 
that after pause completes, no new tasks will be generated; or put another way, 
pause can't run until in-progress tasks are done being created.  This shouldn't 
be necessary for correctness but it does make it easier to reason about.

Pushed to https://github.com/jbellis/cassandra/tree/3430-4, with fix for 2I 
pause.  (3430-3 tried another approach that didn't pan out...)
                
      was (Author: jbellis):
    You're right, there's a race because I assumed that if 
getNextBackgroundTask is already running on CompactionExecutor, it will be part 
of the compaction activity we wait to finish.  Unfortunately I don't see a good 
way to actually make that true; collector.beginCompaction doesn't run until 
we're well into the task (because we can't create the necessary , and 
finishCompaction runs before we unmark.

So instead I'm using the compaction marker itself as an indication that we've 
successfully cancelled everything.  Which is obviously more correct, but I'd 
already found a couple compaction-marker leaks so I was hoping to avoid making 
more of those obvious.  I did the next best thing and added a timed loop after 
which we give up and log.

I've also made pause and getNextBackgroundTask serialized, so we can guarantee 
that after pause completes, no new tasks will be generated; or put another way, 
pause can't run until in-progress tasks are done being created.  This shouldn't 
be necessary for correctness but it does make it easier to reason about.

Pushed to https://github.com/jbellis/cassandra/tree/3430-4, with fix for 2I 
pause.  (3430-3 tried another approach that didn't pan out...)
                  
> Break Big Compaction Lock apart
> -------------------------------
>
>                 Key: CASSANDRA-3430
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3430
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: compaction
>             Fix For: 2.0
>
>         Attachments: 3430-1.0.txt, 3430-1.1.txt, 3430-v2.txt, 3430-v3.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to