[
https://issues.apache.org/jira/browse/CASSANDRA-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513294#comment-16513294
]
Kurt Greaves commented on CASSANDRA-14423:
------------------------------------------
In
{{org.apache.cassandra.db.compaction.CompactionManager#submitAntiCompaction}}
we create a transaction over all SSTables included in the repair (including
repaired SSTables when doing full repair) and pass that through to
{{performAntiCompaction}} in which two things can happen:
1. The SSTable is fully contained within the repairing ranges, and in that case
we mutate repairedAt to the current time of repair and add it to
{{mutatedRepairStatuses}}
2. The SSTable isn't fully contained within the repairing ranges (highly likely
if vnodes or single tokens with >RF nodes). In this case we don't add the
_already repaired_ SSTable to {{mutatedRepairStatuses}}.
We then remove all SSTables from the transaction in {{mutatedRepairStatuses}}
[here|https://github.com/apache/cassandra/blob/191ad7b87a4ded26be4ab0bd192ef676f059276c/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L704].
If *2* occured, the already repaired SSTables were not in
{{mutatedRepairStatuses}} and thus didn't get removed from the transaction and
when {{txn.finish()}} is called they get removed from the CompactionStrategy's
view of sstables via
{{org.apache.cassandra.db.lifecycle.LifecycleTransaction#doCommit}} calling
{{Tracker#notifySSTablesChanged}} which will not include the already repaired
SSTables.
The reason CASSANDRA-13153 brought this bug to light was because up until that
point we _were_ anti-compacting already repaired SSTables, and thus upon
anti-compaction (rewrite) they would be added back into the transaction and the
old SSTable would be removed as usual and the new SSTable would take its place.
Seeing as the existing consensus seems to be that there's no real value at the
moment in mutating repaired times on already repaired SSTables I think the best
solution is to not include the repaired SSTables in the transaction in the
first place. This corresponds with how trunk currently works and also is a lot
cleaner, which is how it works in my patch mentioned above. The alternative
would be to remove them from the transaction regardless of if they were
mutated, but this seems pointless considering we don't do anything with it. If
we ever decide there is value in updating repairedAt on already repaired
SSTables, we can add it back and handle it then.
> SSTables stop being compacted
> -----------------------------
>
> Key: CASSANDRA-14423
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14423
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction
> Reporter: Kurt Greaves
> Assignee: Kurt Greaves
> Priority: Major
> Fix For: 2.2.13, 3.0.17, 3.11.3
>
>
> So seeing a problem in 3.11.0 where SSTables are being lost from the view and
> not being included in compactions/as candidates for compaction. It seems to
> get progressively worse until there's only 1-2 SSTables in the view which
> happen to be the most recent SSTables and thus compactions completely stop
> for that table.
> The SSTables seem to still be included in reads, just not compactions.
> The issue can be fixed by restarting C*, as it will reload all SSTables into
> the view, but this is only a temporary fix. User defined/major compactions
> still work - not clear if they include the result back in the view but is not
> a good work around.
> This also results in a discrepancy between SSTable count and SSTables in
> levels for any table using LCS.
> {code:java}
> Keyspace : xxx
> Read Count: 57761088
> Read Latency: 0.10527088681224288 ms.
> Write Count: 2513164
> Write Latency: 0.018211106398149903 ms.
> Pending Flushes: 0
> Table: xxx
> SSTable count: 10
> SSTables in each level: [2, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 894498746
> Space used (total): 894498746
> Space used by snapshots (total): 0
> Off heap memory used (total): 11576197
> SSTable Compression Ratio: 0.6956629530569777
> Number of keys (estimate): 3562207
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 87
> Local read count: 57761088
> Local read latency: 0.108 ms
> Local write count: 2513164
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 86.33
> Bloom filter false positives: 43
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 8046104
> Bloom filter off heap memory used: 8046024
> Index summary off heap memory used: 3449005
> Compression metadata off heap memory used: 81168
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 5722
> Compacted partition mean bytes: 175
> Average live cells per slice (last five minutes): 1.0
> Maximum live cells per slice (last five minutes): 1
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Dropped Mutations: 0
> {code}
> Also for STCS we've confirmed that SSTable count will be different to the
> number of SSTables reported in the Compaction Bucket's. In the below example
> there's only 3 SSTables in a single bucket - no more are listed for this
> table. Compaction thresholds haven't been modified for this table and it's a
> very basic KV schema.
> {code:java}
> Keyspace : yyy
> Read Count: 30485
> Read Latency: 0.06708991307200263 ms.
> Write Count: 57044
> Write Latency: 0.02204061776873992 ms.
> Pending Flushes: 0
> Table: yyy
> SSTable count: 19
> Space used (live): 18195482
> Space used (total): 18195482
> Space used by snapshots (total): 0
> Off heap memory used (total): 747376
> SSTable Compression Ratio: 0.7607394576769735
> Number of keys (estimate): 116074
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 39
> Local read count: 30485
> Local read latency: NaN ms
> Local write count: 57044
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 79.76
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 690912
> Bloom filter off heap memory used: 690760
> Index summary off heap memory used: 54736
> Compression metadata off heap memory used: 1880
> Compacted partition minimum bytes: 73
> Compacted partition maximum bytes: 124
> Compacted partition mean bytes: 96
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 0
> {code}
> {code:java}
> Apr 27 03:10:39 cassandra[9263]: TRACE o.a.c.d.c.SizeTieredCompactionStrategy
> Compaction buckets are
> [[BigTableReader(path='/var/lib/cassandra/data/yyy/yyy-5f7a2d60e4a811e6868a8fd39a64fd59/mc-67168-big-Data.db'),
>
> BigTableReader(path='/var/lib/cassandra/data/yyy/yyy-5f7a2d60e4a811e6868a8fd39a64fd59/mc-67167-big-Data.db'),
>
> BigTableReader(path='/var/lib/cassandra/data/yyy/yyy-5f7a2d60e4a811e6868a8fd39a64fd59/mc-67166-big-Data.db')]]
> {code}
> Also for every LCS table we're seeing the following warning being spammed
> (seems to be in line with anticompaction spam):
> {code:java}
> Apr 26 21:30:09 cassandra[9263]: WARNÂ o.a.c.d.c.LeveledCompactionStrategy
> Live sstable
> /var/lib/cassandra/data/xxx/xxx-8c3ef9e0e3fc11e6868a8fd39a64fd59/mc-79024-big-Data.db
> from level 0 is not on corresponding level in the leveled manifest. This is
> not a problem per se, but may indicate an orphaned sstable due to a failed
> compaction not cleaned up properly.{code}
> This is a vnodes cluster with 256 tokens per node, and the only thing that
> seems like it could be causing issues is anticompactions.
> CASSANDRA-14079 might be related but doesn't quite describe the same issue,
> and in this case we're using only a single disk for data. Have yet to
> reproduce but figured worth reporting here first.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]