[ 
https://issues.apache.org/jira/browse/CASSANDRA-18619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-18619:
------------------------------------------
    Description: 
Lets have a node with 8 cores and lets do "nodetool setconcurrentcompactors 4"

When I am doing "nodetool garbagecollect", there is a possibility to specify 
number of "jobs" via -j flag. If I set it to "2", max two threads will be 
compacting, if I set it to 6, it will be in practice capped to 4 as that is my 
"concurrentcompactors" setting.

So far good.

However, when I set jobs to 4 and I execute garbagecollecting on two tables, 
tb1 and tb2 like this:
{code:java}
nodetool garbagecollect -j 4 -- keyspace1 tb1 tb2
{code}
What it does is that it will start to gc first table, 4 tables at max AND THEN 
it will start to gc the second table.

In other words, if tb1 has 10 tables to gc and I have 4 jobs at max, it will gc 
them, but if one looks into compactionstats, she sees that as gc-ing 
progresses, there might be e.g. just 2 tables left to gc so in theory there is 
a slot for two additional sstables to gc as well but this will not happen. It 
will wait until the first table is gc-ed and then it will start to gc the 
second one with 4 threads.

This might be improved so as soon as there is a free job thread to gc, next 
sstable would be scheduled to be gc-ed even it is from a different cql table.

  was:
Lets have a node with 8 cores and lets do "nodetool setconcurrentcompactors 4"

When I am doing "nodetool garbagecollect", there is a possibility to specify 
number of "jobs" via -j flag. If I set it to "2", max two threads will be 
compacting, if I set it to 6, it will be in practice capped to 4 as that is my 
"concurrentcompactors" setting.

So far good.

However, when I set jobs to 4 and I execute garbagecollecting on two tables, 
tb1 and tb2 like this:

{code}
nodetool garbagecollect -j 4 -- keyspace1 tb1 tb2
{code}

What it does is that it will start to gc first table, 4 tables at max AND THEN 
it will start to gc the second table.

In other words, if tb1 has 10 tables to gc and I have 4 jobs at max, it will gc 
them, but if one looks into compactionstats, she sees that as gc-ing 
progresses, there might be e.g. just 2 tables left to gc so in theory there is 
a slot for two additional sstables to gc-ed as well but this will not happen. 
It will wait until the first table is gc-ed and then it will start to gc the 
second one with 4 threds.

This might be improved so as soon as there is a free job thread  to gc, next 
sstable would be scheduled to be gc-ed even it is from a different cql table.


> nodetool garbagecollect does not use all available compaction executors
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-18619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18619
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stefan Miklosovic
>            Priority: Normal
>
> Lets have a node with 8 cores and lets do "nodetool setconcurrentcompactors 4"
> When I am doing "nodetool garbagecollect", there is a possibility to specify 
> number of "jobs" via -j flag. If I set it to "2", max two threads will be 
> compacting, if I set it to 6, it will be in practice capped to 4 as that is 
> my "concurrentcompactors" setting.
> So far good.
> However, when I set jobs to 4 and I execute garbagecollecting on two tables, 
> tb1 and tb2 like this:
> {code:java}
> nodetool garbagecollect -j 4 -- keyspace1 tb1 tb2
> {code}
> What it does is that it will start to gc first table, 4 tables at max AND 
> THEN it will start to gc the second table.
> In other words, if tb1 has 10 tables to gc and I have 4 jobs at max, it will 
> gc them, but if one looks into compactionstats, she sees that as gc-ing 
> progresses, there might be e.g. just 2 tables left to gc so in theory there 
> is a slot for two additional sstables to gc as well but this will not happen. 
> It will wait until the first table is gc-ed and then it will start to gc the 
> second one with 4 threads.
> This might be improved so as soon as there is a free job thread to gc, next 
> sstable would be scheduled to be gc-ed even it is from a different cql table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to