[jira] [Comment Edited] (CASSANDRA-15669) LeveledCompactionStrategy compact last level throw an ArrayIndexOutOfBoundsException

Alexey Zotov (Jira) Sat, 17 Apr 2021 13:52:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324337#comment-17324337
 ]


Alexey Zotov edited comment on CASSANDRA-15669 at 4/17/21, 8:51 PM:
--------------------------------------------------------------------

I have checked this issue. I feel I have a kind of clear understanding of what 
is going on.

[~sunhaihong]

Just a point of my curiosity - what values for {{sstable_size_in_mb}} and 
{{fanout_size}} params do you use in prod and how much data do you have? I'm 
just wondering how you were able to face this issue.

[~marcuse]

Looks like you are the best person to discuss this issue (as I can see, you 
actively participated in LCS development).

First of all, I was able to reproduce this issue. I explored the code and 
probably I found a couple of minor issues.
 # *Wrong estimates calculation*
 There is the following comment in the code:
{code:java}
// allocate enough generations for a PB of data, with a 1-MB sstable size.  
(Note that if maxSSTableSize is
// updated, we will still have sstables of the older, potentially smaller size. 
 So don't make this
// dependent on maxSSTableSize.)
static final int MAX_LEVEL_COUNT = (int) Math.log10(1000 * 1000 * 1000);
{code}
It states about a PB of data for 1-MB sstable size configuration, but it does 
not seem to be correct. It would be correct if 10 levels were supported. 
However, 9 levels are currently supported. Here are my calculations (1-MB 
sstable size and 10 fanout size):
{code:java}
L0:    4 * 1 MB = 4 MB
L1: 10^1 * 1 MB = 10 MB
L2: 10^2 * 1 MB = 100 MB
L3: 10^3 * 1 MB = 1000 MB
L4: 10^4 * 1 MB = 10000 MB      = 9.76 GB
L5: 10^5 * 1 MB = 100000 MB     = 97.65 GB
L6: 10^6 * 1 MB = 1000000 MB    = 976.56 GB
L7: 10^7 * 1 MB = 10000000 MB   = 9765.62 GB   = 9.53 TB
L8: 10^8 * 1 MB = 100000000 MB  = 97656.25 GB  = 95.36 TB
L9: 10^9 * 1 MB = 1000000000 MB = 976562.50 GB = 953.67 TB  <-- this level is 
not supported {code}
Here is the place where it is clearly shown that 9 levels (including L0) are 
supported at the moment:
{code:java}
// note that since l0 is broken out, levels[0] represents L1:
private final TreeSet<SSTableReader> [] levels = new TreeSet[MAX_LEVEL_COUNT - 
1];
{code}
Either the comment needs to be fixed or the number of levels needs to be 
increased. I believe fixing the comment would be easier and amount of data 
still would be enough for a regular C* setup.

 # *There is no proper handling of a situation when there is more data than 
supported*
 The issue happens when compaction for L8 is going to be started. Here is the 
flow: {{getCompactionCandidates}} --> {{getCandidatesFor\(i\)}} --> 
{{generations.get(level + 1)}}. So while checking the compaction candidates for 
L8, it tries to see what's going on L9 level and immediately fails. And that's 
fair because we target to support a certain amount of data only. 
 Currently the above flow is triggered when {{score > 1.001}} (there is more 
data than it should be on a level). In fact, we should not even try to check 
candidates for compaction on the highest level, we should just fail fast since 
it is an impossible situation for a properly configured C* cluster. I think a 
clear error should be thrown when there is an attempt to handle more data than 
expected on the highest level, smth like:
{code:java}
if (score > 1.001)
{
    // the highest level should not ever exceed its maximum size
    if (i == generations.levelCount() - 1)
throw new RuntimeException("Highest level (L" + i + ") should not exceed its 
maximum size (" + maxBytesForLevel + "), but it has " + bytesForLevel + " 
bytes");

    // before proceeding with a higher level, let's see if L0 is far enough 
behind to warrant STCS
    if (l0Compaction != null)
        return l0Compaction;
    ...
}
{code}
 

I'd be glad to hear you feedback on the points above. If you find the 
suggestions reasonable, I'd like to come up with a patch (I have a draft, but 
before polishing it I'd like to validate my understanding). Probably I'd also 
update the documentation to clearly state number of levels supported and the 
ways to estimate data size.

 


was (Author: azotcsit):
I have checked this issue. I feel I have a kind of clear understanding of what 
is going on.

[~sunhaihong]

Just a point of my curiosity - what values for {{sstable_size_in_mb}} and 
{{fanout_size}} params do you use in prod and how much data do you have? I'm 
just wondering how you were able to face this issue.

[~marcuse]

Looks like you are the best person to discuss this issue (as I can see, you 
actively participated in LCS development).

First of all, I was able to reproduce this issue. I explored the code and 
probably I found a few issues.
 # Wrong estimates calculation
There is the following comment in the code:
{code:java}
// allocate enough generations for a PB of data, with a 1-MB sstable size.  
(Note that if maxSSTableSize is
// updated, we will still have sstables of the older, potentially smaller size. 
 So don't make this
// dependent on maxSSTableSize.)
static final int MAX_LEVEL_COUNT = (int) Math.log10(1000 * 1000 * 1000);
{code}
It states about a PB of data for 1-MB sstable size configuration, but it does 
not seem to be correct. It would be correct if 10 levels were supported. 
However, 9 levels are currently supported. Here are my calculations (1-MB 
sstable size and 10 fanout size):
{code:java}
L0:    4 * 1 MB = 4 MB
L1: 10^1 * 1 MB = 10 MB
L2: 10^2 * 1 MB = 100 MB
L3: 10^3 * 1 MB = 1000 MB
L4: 10^4 * 1 MB = 10000 MB      = 9.76 GB
L5: 10^5 * 1 MB = 100000 MB     = 97.65 GB
L6: 10^6 * 1 MB = 1000000 MB    = 976.56 GB
L7: 10^7 * 1 MB = 10000000 MB   = 9765.62 GB   = 9.53 TB
L8: 10^8 * 1 MB = 100000000 MB  = 97656.25 GB  = 95.36 TB
L9: 10^9 * 1 MB = 1000000000 MB = 976562.50 GB = 953.67 TB  <-- this level is 
not supported {code}
Here is the place where it is clearly shown that 9 levels (including L0) are 
supported at the moment:
{code:java}
// note that since l0 is broken out, levels[0] represents L1:
private final TreeSet<SSTableReader> [] levels = new TreeSet[MAX_LEVEL_COUNT - 
1];
{code}
Either the comment needs to be fixed or the number of levels needs to be 
increased. I believe fixing the comment would be easier and amount of data 
still would be enough for a regular C* setup.
 # L8 is not being handled properly
The issue happens when compaction for L8 is going to be started. Here is the 
flow: {{getCompactionCandidates}} --> {{getCandidatesFor\(i\)}} --> 
{{generations.get(level + 1)}}. So while checking the compaction candidates for 
the L8, it tries to see what's going on L9 level and immediately fails. Which 
means L8 is not ever being built/compacted. I think we need to handle the 
highest level (8th) separately in {{getCandidatesFor\(i\)}} method as we handle 
L0 (with a separate condition).
 # There is no proper handling of a situation when there is more data than 
supported
As it is stated in the first point, we target to support a certain amount of 
data. I think a clear error should be thrown when there is an attempt to handle 
more data than expected on the highest level, smth like:
{code:java}
long bytesForLevel = SSTableReader.getTotalBytes(sstablesInLevel);
long maxBytesForLevel = maxBytesForLevel(i, maxSSTableSizeInBytes);
if (i == generations.levelCount() - 1 && bytesForLevel > maxBytesForLevel)
    throw new RuntimeException("Current size (" + bytesForLevel + ") exceeds 
max size (" + maxBytesForLevel + ") for " + i + " level");
{code}
 

I'd be glad to hear you feedback on the points above. If you find the 
suggestions reasonable, I'd like to come up with a patch (I have a draft, but 
before polishing it I'd like to validate my understanding).

 

> LeveledCompactionStrategy compact last level throw an 
> ArrayIndexOutOfBoundsException
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15669
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: sunhaihong
>            Assignee: sunhaihong
>            Priority: Normal
>         Attachments: cfs_compaction_info.png, error_info.png
>
>
> Cassandra will throw an ArrayIndexOutOfBoundsException when compact last 
> level.
> My test is as follows：
>  # Create a table with LeveledCompactionStrategy and its params are 
> 'enabled': 'true', 'fanout_size': '2', 'max_threshold': '32', 
> 'min_threshold': '4', 'sstable_size_in_mb': '2'（fanout_size and 
> sstable_size_in_mb are too small just to make it easier to reproduce the 
> problem）;
>  # Insert data into the table by stress;
>  # Cassandra throw an ArrayIndexOutOfBoundsException when compact level9 
> sstables(this level score bigger than 1.001)
> ERROR [CompactionExecutor:4] 2020-03-28 08:59:00,990 CassandraDaemon.java:442 
> - Exception in thread Thread[CompactionExecutor:4,1,main]
>  java.lang.ArrayIndexOutOfBoundsException: 9
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getLevel(LeveledManifest.java:814)
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(LeveledManifest.java:746)
>  at 
> org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:398)
>  at 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:131)
>  at 
> org.apache.cassandra.db.compaction.CompactionStrategyHolder.lambda$getBackgroundTaskSuppliers$0(CompactionStrategyHolder.java:109)
>  at 
> org.apache.cassandra.db.compaction.AbstractStrategyHolder$TaskSupplier.getTask(AbstractStrategyHolder.java:66)
>  at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:214)
>  at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:289)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>  at java.util.concurrent.FutureTask.run(FutureTask.java)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.lang.Thread.run(Thread.java:748)
> I tested it on cassandra version 3.11.3 & 4.0-alpha3. The exception all 
> happened.
> once it triggers, level1- leveln compaction no longer works, level0 is still 
> valid
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-15669) LeveledCompactionStrategy compact last level throw an ArrayIndexOutOfBoundsException

Reply via email to