[
https://issues.apache.org/jira/browse/CASSANDRA-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445374#comment-16445374
]
Lerh Chuan Low commented on CASSANDRA-10540:
--------------------------------------------
Hey Marcus, thanks for getting back so quickly :)
The big motivation for us is repairs...because when vnodes are turned on, every
SSTable has many vnodes in them...so when a (incremental) repair happens, the
range it is interested in gets anticompacted out. After that the next range
gets anticompacted out and so on. RACS solves that big pain.
Besides making the SSTables per read much nicer, it's like a LCS on steroids. I
think there are other benefits of maintaining each SSTable to one token
range...but I can't quite remember any more off the top of my head.
So I am hoping it doesn't come to grouping the vnodes....unless it's a last
resort.
Currently it looks like you create a RACS for each of the
repaired/unrepaired/pending repairs set, and each RACS keeps track of the
compaction strategies it is in charge of (which are all of the same class). The
CS instances are lazily initiated (so that's a win right there) until needed.
It seems to be that the reason why we want so many CS instances is so that each
of them can keep track of their own SSTables (which all belong to that single
token range).
How about if RACS doesn't instantiate the individual CS instances? It keeps
track of all the SSTables in the CF like other CS instances - just that the
logic on which SSTables to involve in a compaction is done in RACS. So we can
make RACS check L0 and if there are none, L1 would involve grouping the
SSTables by range and then calling the next background task for the
underlying/wrapped CS instances and submitting them.
In this way, the downside is calculating the grouping each time we ask for the
next background task. We could also store it in memory in the form of a
manifest like in LCS? So an array with the SSTables in each of them - beats
having 256 instances but we're still going to have a 256 sized array in memory,
I guess. It just seems so starkingly similar to a LCS restricted to just L0 and
L1.
A final thought: Is the memory footprint actually significant enough for us to
want to not bite the bullet and further group the vnodes because the gains from
having each SSTable as a single range is a lot, simple is a feature and RACS is
customizable?
Please excuse my ignorance if none of those suggestions made sense/worked,
still not very confident with the code base..
> RangeAwareCompaction
> --------------------
>
> Key: CASSANDRA-10540
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10540
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Priority: Major
> Labels: compaction, lcs, vnodes
> Fix For: 4.x
>
>
> Broken out from CASSANDRA-6696, we should split sstables based on ranges
> during compaction.
> Requirements;
> * dont create tiny sstables - keep them bunched together until a single vnode
> is big enough (configurable how big that is)
> * make it possible to run existing compaction strategies on the per-range
> sstables
> We should probably add a global compaction strategy parameter that states
> whether this should be enabled or not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]