[
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211081#comment-16211081
]
Joel Knighton commented on CASSANDRA-13873:
-------------------------------------------
You're correct that cancelling will also finish the txn and allow operations to
select and reference canonical sstables. In the specific repro that Jake
provided, which is the case of multiple scrubs over the same cfs (an admittedly
somewhat artificial case), we'll try to select and reference canonical sstables
in the snapshot before cancelling the original scrub compaction, so the new
scrubs will hang until the original scrub finishes.
That'd be great if you could review. I'm admittedly very unfamiliar with this
part of the code, so I expect my initial patch is a rough sketch of the
eventual solution.
As far as criticality goes, I could go either way. I know of no situations that
this causes data loss or permanent deadlocks at this time, but it can
potentially cause operations referencing canonical sstables to hang for long
periods of time.
> Ref bug in Scrub
> ----------------
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
> Issue Type: Bug
> Reporter: T Jake Luciani
> Assignee: Joel Knighton
> Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node. This doesn't
> happen on 3.0.X. I'm not sure if/if not this happens with compactions too
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722
> NoSpamLogger.java:97 - Spinning trying to capture readers
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released:
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>
> This released table has a selfRef of 0 but is in the Tracker
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]