[
https://issues.apache.org/jira/browse/CASSANDRA-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605978#comment-13605978
]
Jonathan Ellis edited comment on CASSANDRA-3430 at 3/19/13 3:01 AM:
--------------------------------------------------------------------
You're right, there's a race because I assumed that if getNextBackgroundTask is
already running on CompactionExecutor, it will be part of the compaction
activity we wait to finish. Unfortunately I don't see a good way to actually
make that true; collector.beginCompaction doesn't run until we're well into the
task (because we can't create the necessary , and finishCompaction runs before
we unmark.
So instead I'm using the compaction marker itself as an indication that we've
successfully cancelled everything. Which is obviously more correct, but I'd
already found a couple compaction-marker leaks so I was hoping to make any
regressions there obvious. I did the next best thing and added a timed loop
after which we give up and log.
I've also made pause and getNextBackgroundTask serialized, so we can guarantee
that after pause completes, no new tasks will be generated; or put another way,
pause can't run until in-progress tasks are done being created. This shouldn't
be necessary for correctness but it does make it easier to reason about.
Pushed to https://github.com/jbellis/cassandra/tree/3430-4, with fix for 2I
pause. (3430-3 tried another approach that didn't pan out...)
was (Author: jbellis):
You're right, there's a race because I assumed that if
getNextBackgroundTask is already running on CompactionExecutor, it will be part
of the compaction activity we wait to finish. Unfortunately I don't see a good
way to actually make that true; collector.beginCompaction doesn't run until
we're well into the task (because we can't create the necessary , and
finishCompaction runs before we unmark.
So instead I'm using the compaction marker itself as an indication that we've
successfully cancelled everything. Which is obviously more correct, but I'd
already found a couple compaction-marker leaks so I was hoping to avoid making
more of those obvious. I did the next best thing and added a timed loop after
which we give up and log.
I've also made pause and getNextBackgroundTask serialized, so we can guarantee
that after pause completes, no new tasks will be generated; or put another way,
pause can't run until in-progress tasks are done being created. This shouldn't
be necessary for correctness but it does make it easier to reason about.
Pushed to https://github.com/jbellis/cassandra/tree/3430-4, with fix for 2I
pause. (3430-3 tried another approach that didn't pan out...)
> Break Big Compaction Lock apart
> -------------------------------
>
> Key: CASSANDRA-3430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3430
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Priority: Minor
> Labels: compaction
> Fix For: 2.0
>
> Attachments: 3430-1.0.txt, 3430-1.1.txt, 3430-v2.txt, 3430-v3.txt
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira