[
https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226744#comment-14226744
]
Nikolai Grigoriev edited comment on CASSANDRA-8301 at 11/26/14 8:04 PM:
------------------------------------------------------------------------
The logic I have built is very simple. And probably has some fundamental flaws
:)
First I calculate the target size for each level (in bytes) to accommodate all
my data - i.e. to distribute the total size of all my sstables. This also gives
me the maximum level to target. Then I take all sstables for the given CF, sort
them by the beginning (left) of their bounds. Then I start from the highest
level (L4 in my example) and iterate over that list of sstables. I grab the
first sstable, remember its bounds, put it to the current level. Then skip to
the next one that does not intersect with these bounds, assign it to the
current level and change the bounds. And so on until the end of the list or
until I use all available size. Then I move to the lower level and repeat it on
the remaining sstables. And so on. The remainder goes to L0 where overlaps are
allowed (right?).
I had to also come up with some logic to exclude the sstables that cover large
range of tokens. Most likely these are the ones that were recently written at
L0 on the original node - they cover whatever was recently written into them,
right? I ignore those from my logic and leave them for L0.
Or did I get it completely wrong?
was (Author: [email protected]):
The logic I have built is very simple. And probably has some fundamental flaws
:)
First I calculate the target size for each level (in bytes) to accommodate all
my data - i.e. to distribute the total size of all my sstables. This also gives
me the maximum level to target. Then I take all sstables for the given CF, sort
them by the beginning (left) of their bounds. Then I start from the highest
level (L4 in my example) and iterate over that list of sstables. I grab the
first sstable, remember its bounds, put it to the current level. Then skip to
the next one that does not intersect with these bounds, assign it to the
current level and change the bounds. And so on until the end of the list or
until I use all available size. Then I move to the lower level and repeat it on
the remaining sstables. And so on. The remainder goes to L0 where overlaps are
allowed (right?).
I had to also come up with some logic to exclude the sstables that cover large
range of tokens. Most likely these are the ones that were recently written at
L0 on the source node - they cover whatever was recently written into them,
right? I ignore those from my logic and leave them for L0.
Or did I get it completely wrong?
> Create a tool that given a bunch of sstables creates a "decent" sstable
> leveling
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-8301
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8301
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Marcus Eriksson
>
> In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new
> node, you will end up with a ton of files in L0 and it might be extremely
> painful to get LCS to compact into a new leveling
> We could probably exploit the fact that we have many non-overlapping sstables
> in L0, and offline-bump those sstables into higher levels. It does not need
> to be perfect, just get the majority of the data into L1+ without creating
> overlaps.
> So, suggestion is to create an offline tool that looks at the range each
> sstable covers and tries to bump it as high as possible in the leveling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)