[ https://issues.apache.org/jira/browse/CASSANDRA-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226708#comment-14226708 ]
Marcus Eriksson commented on CASSANDRA-8301: -------------------------------------------- cool, what is your heuristic for finding the level? I thought a bit about it and figured that we could probably estimate level by ordering sstables by the number of other sstables they overlap, then putting the ones that overlap the most in the lowest levels ie, an sstable in L1 is bound to overlap ~10 in L2, 100 in L3 etc, meaning it would overlap 110 sstables if we only have 3 levels, an sstable in L2 would overlap 10 in L3 and only one in L1, total 11, and sstables in the top level would only overlap one in L2 and one in L1. This assumes L0 was empty when bootstrapping which is most often wrong and I haven't given much thought on how to fix that > Create a tool that given a bunch of sstables creates a "decent" sstable > leveling > -------------------------------------------------------------------------------- > > Key: CASSANDRA-8301 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8301 > Project: Cassandra > Issue Type: Improvement > Reporter: Marcus Eriksson > > In old versions of cassandra (i.e. not trunk/3.0), when bootstrapping a new > node, you will end up with a ton of files in L0 and it might be extremely > painful to get LCS to compact into a new leveling > We could probably exploit the fact that we have many non-overlapping sstables > in L0, and offline-bump those sstables into higher levels. It does not need > to be perfect, just get the majority of the data into L1+ without creating > overlaps. > So, suggestion is to create an offline tool that looks at the range each > sstable covers and tries to bump it as high as possible in the leveling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)