[
https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262498#comment-14262498
]
Joshua McKenzie commented on CASSANDRA-8399:
--------------------------------------------
While I agree that the Right Thing here seems to be to protect the entire
compaction operation by holding a reference, I'm not sure that the 2.X line is
appropriate for this change at the DataTracker level. While acquiring and
releasing within a single SSTableScanner is a cleanly tied together RAII
operation that should be an "invisible" change from a logical flow / API
perspective, pushing that operation into markCompacting and unmarkCompacting
means we have over 10 upstream users of those methods that are having an
assumption (and contract) changed on them - namely, that if they fail to
acquire references on the SSTables in question markCompacting will return
false. Correct me if I'm wrong on that - if there's some other more
appropriate place to make this change than in the DataTracker (haven't worked
much in this section of the code-base).
A naive change in DataTracker.markCompacting leads to infinite loops (it looks
like from multiple insertion points) so we'd need to go upstream and fiddle
with the various marking operations in order to accommodate entries in the
SSTableReader collections being "unmarkable". My preference here would be to
go with _v2 which resolves the ordering problems introduced in CASSANDRA-7932
without introducing a ref count on the read path and create a separate ticket
for 3.0 to pursue the more invasive change of reference counting all compacting
sstables.
As you've mentioned several times, reference counting is tricky to get right.
The idea of promoting it up to the abstraction of the data tracker for
compaction marking strikes me as a risky change when we already have quite a
few failing unit tests on 2.X and bugs to resolve. I definitely think it's the
right thing long-term.
> Reference Counter exception when dropping user type
> ---------------------------------------------------
>
> Key: CASSANDRA-8399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8399
> Project: Cassandra
> Issue Type: Bug
> Reporter: Philip Thompson
> Assignee: Joshua McKenzie
> Fix For: 2.1.3
>
> Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log,
> ubuntu-8399.log
>
>
> When running the dtest
> {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}}
> with the current 2.1-HEAD code, very frequently, but not always, when
> dropping a type, the following exception is seen:{code}
> ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 -
> Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.AssertionError: Reference counter -1 for
> /var/folders/v3/z4wf_34n1q506_xjdy49gb780000gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche
> ma_keyspaces-ka-14-Data.db
> at
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662)
> ~[main/:na]
> at
> org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164)
> ~[main/:na]
> at
> org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029)
> ~[main/:na]
> at
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963)
> ~[main/:na]
> at
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744)
> ~[main/:na]
> at
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731)
> ~[main/:na]
> at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374)
> ~[main/:na]
> at
> org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399)
> ~[main/:na]
> at
> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167)
> ~[main/:na]
> at
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
> ~[main/:na]
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ~[main/:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> ~[na:1.7.0_67]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> ~[na:1.7.0_67]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_67]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_67]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code}
> Log of the node with the error is attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)