[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211081#comment-16211081
 ] 

Joel Knighton commented on CASSANDRA-13873:
-------------------------------------------

You're correct that cancelling will also finish the txn and allow operations to 
select and reference canonical sstables. In the specific repro that Jake 
provided, which is the case of multiple scrubs over the same cfs (an admittedly 
somewhat artificial case), we'll try to select and reference canonical sstables 
in the snapshot before cancelling the original scrub compaction, so the new 
scrubs will hang until the original scrub finishes.

That'd be great if you could review. I'm admittedly very unfamiliar with this 
part of the code, so I expect my initial patch is a rough sketch of the 
eventual solution.

As far as criticality goes, I could go either way. I know of no situations that 
this causes data loss or permanent deadlocks at this time, but it can 
potentially cause operations referencing canonical sstables to hang for long 
periods of time.

> Ref bug in Scrub
> ----------------
>
>                 Key: CASSANDRA-13873
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: T Jake Luciani
>            Assignee: Joel Knighton
>            Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to