[ 
https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-8399:
---------------------------------------
    Attachment: 8399_v2.txt

bq. this should be ensured by the operation that creates the scanner, not the 
scanner itself. We are leaking resource management into the scanner code that 
shouldn't be there.
I agree.  At the time I first changed this code, I wasn't aware of the OpOrder 
-> SSTableScanner protection relationship so the design was more appropriate.

bq. We should remove the reference counting all together from this area of 
code. The fact that we were blindly acquiring before means that we knew it had 
to succeed, meaning that it also was not necessary.
I partially agree.  While we don't strictly need to ref count to protect, we do 
need to document and/or codify the relationship between scanners and sstr's - 
the change in the CompactionTask with CASSANDRA-7932 uncovered the fact that we 
were relying on an undocumented ordering relationship between SSTR closing and 
SSTableScanner closing for the operation to work on Windows.  This relationship 
needs to be formalized if we're to move forward with supporting mmap'ed I/O on 
Windows as the same strict file close ordering restrictions apply in that 
context (i.e. it's not fixed by CASSANDRA-4050).

bq. Having two mechanisms for resource management is fine, so long as we don't 
mix them. A macro operation needs to be protected by one or the other, and 
sub-operations should rely on these macro operations to ensure their protection.
I believe this claim needs more context to be justified; having two methods to 
accomplish the same thing is error prone, considerably more difficult to 
maintain and reason about, and needs to be offset by other substantial gains to 
justify duplication.  In this case (OpOrder vs. ref-counting) I believe it's 
justified from the performance gains, however the code-structure (methods / 
flow) doesn't make this relationship immediately clear to others coming into 
the code modified for CASSANDRA-5549 / CASSANDRA-6689.  Clear in retrospect, 
but the current structure of the code doesn't "self-document" this relationship 
at this time.  Perhaps wrapping an interface around readOrdering in SSTR rather 
than having it be public and accessible would help clarify the object's role 
when working with loosely related code.

Regardless - 8399_v2.txt attached that moves the scope closing in 
CompactionTask, documents the temporal coupling, and removes ref counting from 
within SSTableScanner.  Errors on Windows remain suppressed w/regards to 
compaction file access and the dtest ran clean on 50 runs.

> Reference Counter exception when dropping user type
> ---------------------------------------------------
>
>                 Key: CASSANDRA-8399
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8399
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>             Fix For: 2.1.3
>
>         Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log, 
> ubuntu-8399.log
>
>
> When running the dtest 
> {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} 
> with the current 2.1-HEAD code, very frequently, but not always, when 
> dropping a type, the following exception is seen:{code}
> ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 - 
> Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.AssertionError: Reference counter -1 for 
> /var/folders/v3/z4wf_34n1q506_xjdy49gb780000gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche
> ma_keyspaces-ka-14-Data.db
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) 
> ~[main/:na]
>         at 
> org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731)
>  ~[main/:na]
>         at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) 
> ~[main/:na]
>         at 
> org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_67]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_67]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_67]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_67]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code}
> Log of the node with the error is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to