[
https://issues.apache.org/jira/browse/CASSANDRA-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291895#comment-14291895
]
Marcus Eriksson commented on CASSANDRA-8683:
--------------------------------------------
from irc:
{noformat}
15:27 < marcuse> belliottsmith: we create a new sstable reader over the same
file, the reference counter is not transfered to the new file, so if i bump the
reference count on the file i hold a reference to, it will not affect the
latest one
15:27 < marcuse> does it make sense/
15:27 < marcuse> * transfered to the new SSTR instance
15:28 < belliottsmith> but you should have taken a reference against the one
you're using, no?
15:28 < marcuse> belliottsmith: no, we don't want to hold references during the
whole duration of the repair (hours)
15:28 < belliottsmith> each instance is distinct, but so long as you take a
reference against the one you're using, it shouldn't be a problem...
15:28 < marcuse> so, if a file is gone, we simply remove it from the set and
dont anticompact it
15:29 < belliottsmith> ah
15:29 < belliottsmith> hmm. that is tricky, since we need to repeatedly refetch
the set of files we want to use, since we could have new data replacing it as
well, surely?
15:30 < belliottsmith> i don't really see a way around holding a reference for
the lifetime of the repair. but if we stream oldest files first we can drop
references to them as we go, no>?
15:30 < marcuse> no, for incremental repairs we repair best-effort, if a file
is gone, it means it has been compacted away and will get repaired next time
around
15:31 < belliottsmith> so where/when do we grab our reference to the sstable
then?
15:31 < belliottsmith> because in that case it sounds like we should simply be
failing to grab our reference count
15:31 < marcuse>
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ActiveRepairService.java#L409
15:32 < marcuse> and yes, we probably should fail, but we don't
15:32 < belliottsmith> that code looks a bit odd. why do we check for data
component first? shouldn't an attempt to acquire the ref be sufficient?
(clearly it isn't here though, since it's not removing it from the set)
15:33 < marcuse> yes, it should, but i don't think we decrease ref count when
we replace an instance?
15:34 < marcuse> ie, when we open early and have a new instance, we should
decrease the reference on the old file
15:34 < marcuse> *old instance\
15:34 < belliottsmith> we have to
15:34 < belliottsmith> otherwise it would never get deleted
15:34 < marcuse> but its the same underlying file right? it will get deleted by
the new instance
15:34 < belliottsmith> if we don't that's definitely a bug, but i'm pretty sure
we do
15:34 < belliottsmith> but lifetimes don't expire in order necessarily
15:35 < belliottsmith> so the older files ref count to zero, then remove
themselves from the linked-list of replacements
15:35 < belliottsmith> if there's nobody left, they delete the file
15:36 < belliottsmith> this all is in dire need of cleaning up in 8568
{noformat}
> Incremental repairs broken with early opening of compaction results
> -------------------------------------------------------------------
>
> Key: CASSANDRA-8683
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8683
> Project: Cassandra
> Issue Type: Bug
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Fix For: 2.1.3
>
>
> Incremental repairs holds a set of the sstables it started the repair on (we
> need to know which sstables were actually validated to be able to anticompact
> them). This includes any tmplink files that existed when the compaction
> started (if we wouldn't include those, we would miss data since we move the
> start point of the existing non-tmplink files)
> With CASSANDRA-6916 we swap out those instances with new ones
> (SSTR.cloneWithNewStart / SSTW.openEarly), meaning that the underlying file
> can get deleted even though we hold a reference.
> This causes the unit test error:
> http://cassci.datastax.com/job/trunk_utest/1330/testReport/junit/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testValidationMultipleSSTablePerLevel/
> (note that it only fails on trunk though, in 2.1 we don't hold references to
> the repairing files for non-incremental repairs, but the bug should exist in
> 2.1 as well)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)