Re: Compaction throughput

2019-07-19 Thread Vlad
 Thanks!

On Friday, July 19, 2019, 10:15:43 PM GMT+3, Jon Haddad 
 wrote:  
 
 It's a limit on the total compaction throughput. 
On Fri, Jul 19, 2019 at 10:39 AM Vlad  wrote:

Hi,
is  'nodetool setcompactionthroughput' sets limit for all compactions on the 
node, or is it per compaction thread?
Thanks.

  

Re: Compaction throughput

2019-07-19 Thread Jon Haddad
It's a limit on the total compaction throughput.

On Fri, Jul 19, 2019 at 10:39 AM Vlad  wrote:

> Hi,
>
> is  'nodetool setcompactionthroughput' sets limit for all compactions on
> the node, or is it per compaction thread?
>
> Thanks.
>


Compaction throughput

2019-07-19 Thread Vlad
Hi,
is  'nodetool setcompactionthroughput' sets limit for all compactions on the 
node, or is it per compaction thread?
Thanks.


Re: Compaction throughput vs. number of compaction threads?

2018-06-05 Thread Alexander Dejanovski
Hi,

The compaction throughput is indeed shared by all compactors.
I would not advise to go below 8MB/s per compactor as slowing down
compactions put more pressure on the heap.

When tuning compaction, the first thing to do is evaluate the maximum
throughput your disks can sustain without impacting p99 read latencies.
Then you can consider raising the number of compactors if you're still
seeing contention.

So the advice would be : don't raise the number of compactors, 4 is
probably enough already and tune the compaction throughput if you're
running on SSDs or if you have an array of HDDs.

Cheers,

On Tue, Jun 5, 2018 at 10:48 AM Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:

> Hello,
>
>
>
> most likely obvious and perhaps already answered in the past, but just
> want to be sure …
>
>
>
> E.g. I have set:
>
> concurrent_compactors: 4
>
> compaction_throughput_mb_per_sec: 16
>
>
>
> I guess this will lead to ~ 4MB/s per Thread if I have 4 compactions
> running in parallel?
>
>
>
> So, in case of upscaling a machine and following the recommendation in
> cassandra.yaml I may set:
>
>
>
> concurrent_compactors: 8
>
>
>
>
>
> If this throughput remains unchanged, does this mean that we have 2 MB/s
> per Thread then, e.g. largish compactions running on a single thread taking
> twice the time then?
>
>
>
> Using Cassandra 2.1 and 3.11 in case this matters.
>
>
>
>
>
> Thanks a lot!
>
> Thomas
>
>
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Compaction throughput vs. number of compaction threads?

2018-06-05 Thread Steinmaurer, Thomas
Hello,

most likely obvious and perhaps already answered in the past, but just want to 
be sure ...

E.g. I have set:

concurrent_compactors: 4
compaction_throughput_mb_per_sec: 16

I guess this will lead to ~ 4MB/s per Thread if I have 4 compactions running in 
parallel?

So, in case of upscaling a machine and following the recommendation in 
cassandra.yaml I may set:

concurrent_compactors: 8


If this throughput remains unchanged, does this mean that we have 2 MB/s per 
Thread then, e.g. largish compactions running on a single thread taking twice 
the time then?

Using Cassandra 2.1 and 3.11 in case this matters.


Thanks a lot!
Thomas

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?dterstra?e 313


Re: compaction throughput

2016-01-29 Thread Jan Karlsson
Keep in mind that compaction in LCS can only run 1 compaction per level. 
Even if it wants to run more compactions in L0 it might be blocked 
because it is already running a compaction in L0.


BR
Jan

On 01/16/2016 01:26 AM, Sebastian Estevez wrote:


LCS is IO ontensive but CPU is also relevant.

On slower disks compaction may not be cpu bound.

If you aren't seeing more than one compaction thread at a time, I 
suspect your system is not compaction bound.


all the best,

Sebastián

On Jan 15, 2016 7:20 PM, "Kai Wang" <dep...@gmail.com 
<mailto:dep...@gmail.com>> wrote:


Sebastian,

Because I have this impression that LCS is IO intensive and it's
recommended only on SSDs. So I am curious to see how far it can
stress those SSDs. But it turns out the most expensive part about
LCS is not IO bound but CUP bound, or more precisely single core
speed bound. This is a little surprising.

Of course LCS is still superior in other aspects.

On Jan 15, 2016 6:34 PM, "Sebastian Estevez"
<sebastian.este...@datastax.com
<mailto:sebastian.este...@datastax.com>> wrote:

Correct.

Why are you concerned with the raw throughput, are you
accumulating pending compactions? Are you seeing high sstables
per read statistics?

all the best,

Sebastián

On Jan 15, 2016 6:18 PM, "Kai Wang" <dep...@gmail.com
<mailto:dep...@gmail.com>> wrote:

Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C*
only uses one core most of the time. *nodetool
compactionstats* shows there's only one compactor running.
I can see C* process only uses one core. So I guess I
should've asked the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
    2. Is there any configuration that affects single core
compaction throughput?
3. Is concurrent_compactors the only option to parallelize
compaction? If so, I guess it's the compaction strategy
itself that decides when to parallelize and when to block
on one core. Then there's not much we can do here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa
<jeff.ji...@crowdstrike.com
<mailto:jeff.ji...@crowdstrike.com>> wrote:

With SSDs, the typical recommendation is up to 0.8-1
compactor per core (depending on other load). How many
CPU cores do you have?


From: Kai Wang
Reply-To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>"
Subject: compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction
on my node. The node is CentOS 7 and has SSDs
installed. The table is configured to use LCS. Here is
my compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

*nodetool compaction* shows most of time there is one
compaction. Sometimes there are 3-4 (I suppose this is
controlled by concurrent_compactors). During the
compaction, I see one CPU core is 100%. At that point,
disk IO is about 20-25 M/s write which is much lower
than the disk is capable of. Even when there are 4
compactions running, I see CPU go to +400% but disk IO
is still at 20-25M/s write. I use *nodetool
setcompactionthroughput 0* to disable the compaction
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is
kinda low. Is there anyway to improve the throughput?

Thanks.






Re: compaction throughput

2016-01-21 Thread PenguinWhispererThe .
Thanks for that clarification Sebastian! That's really good to know! I
never took increasing this value in consideration because of my previous
experience.

In my case I had a table that was compacting over and over... and only one
CPU was used. So that made me believe it was not multithreaded (I actually
believe I asked this on IRC however it's been a few months ago so I might
be wrong).

Have there been behavioral changes on this lately? (I was using 2.0.9 or
2.0.11 I believe).

2016-01-21 14:15 GMT+01:00 Sebastian Estevez <sebastian.este...@datastax.com
>:

> >So compaction of one table will NOT spread over different cores.
>
> This is not exactly true. You actually can have multiple compactions
> running at the same time on the same table, it just doesn't happen all that
> often. You essentially would have to have two sets of sstables that are
> both eligible for compactions at the same time.
>
> all the best,
>
> Sebastián
> On Jan 21, 2016 7:41 AM, "PenguinWhispererThe ." <
> th3penguinwhispe...@gmail.com> wrote:
>
>> After having some issues myself with compaction I think it's noteworthy
>> to explicitly state that compaction of a table can only run on one CPU. So
>> compaction of one table will NOT spread over different cores.
>> To really have use of concurrent_compactors you need to have multiple
>> table compactions initiated at the same time. If those are small they'll
>> finish way earlier resulting in only one core using 100% as compaction is
>> generally CPU bound (unless your disks can't keep up).
>> I believe it's better to be CPU(core) bound on one core(or at least not
>> all) for compaction than disk IO bound as this would result in writes and
>> reads, ... having performance impact.
>> Compaction is a maintenance task so it shouldn't be eating all your
>> resources.
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>  This
>> email has been sent from a virus-free computer protected by Avast.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>> <#1162782367_-1582318301_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> 2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com>:
>>
>>> Jeff & Sebastian,
>>>
>>> Thanks for the reply. There are 12 cores but in my case C* only uses one
>>> core most of the time. *nodetool compactionstats* shows there's only
>>> one compactor running. I can see C* process only uses one core. So I guess
>>> I should've asked the question more clearly:
>>>
>>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>>> 2. Is there any configuration that affects single core compaction
>>> throughput?
>>> 3. Is concurrent_compactors the only option to parallelize compaction?
>>> If so, I guess it's the compaction strategy itself that decides when to
>>> parallelize and when to block on one core. Then there's not much we can do
>>> here.
>>>
>>> Thanks.
>>>
>>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
>>>> (depending on other load).  How many CPU cores do you have?
>>>>
>>>>
>>>> From: Kai Wang
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Friday, January 15, 2016 at 12:53 PM
>>>> To: "user@cassandra.apache.org"
>>>> Subject: compaction throughput
>>>>
>>>> Hi,
>>>>
>>>> I am trying to figure out the bottleneck of compaction on my node. The
>>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>>> LCS. Here is my compaction related configs in cassandra.yaml:
>>>>
>>>> compaction_throughput_mb_per_sec: 160
>>>> concurrent_compactors: 4
>>>>
>>>> I insert about 10G of data and start observing compaction.
>>>>
>>>> *nodetool compaction* shows most of time there is one compaction.
>>>> Sometimes there are 3-4 (I suppose this is controlled by
>>>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>>>> At that point, disk IO is about 20-25 M/s write which is much lower than
>>>> the disk is capable of. Even when there are 4 compactions running, I see
>>>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>>>> setcompactionthroughput 0* to disable the compaction throttle but
>>>> don't see any difference.
>>>>
>>>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>>>> there anyway to improve the throughput?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>


Re: compaction throughput

2016-01-21 Thread Peddi, Praveen
That is interesting...
We recently resolved a performance issue solely by increasing 
concurrent_compactors parameter from default to 64. We have two tables but 90% 
data is only in 1 table. We got read performance boost of more than 100% just 
by increasing that param in yaml. Based on what you said, my observations look 
contradictory. Could you elaborate on how you came to that conclusion?


On Jan 21, 2016, at 7:42 AM, PenguinWhispererThe . 
<th3penguinwhispe...@gmail.com<mailto:th3penguinwhispe...@gmail.com>> wrote:

After having some issues myself with compaction I think it's noteworthy to 
explicitly state that compaction of a table can only run on one CPU. So 
compaction of one table will NOT spread over different cores.
To really have use of concurrent_compactors you need to have multiple table 
compactions initiated at the same time. If those are small they'll finish way 
earlier resulting in only one core using 100% as compaction is generally CPU 
bound (unless your disks can't keep up).
I believe it's better to be CPU(core) bound on one core(or at least not all) 
for compaction than disk IO bound as this would result in writes and reads, ... 
having performance impact.
Compaction is a maintenance task so it shouldn't be eating all your resources.

[https://ipmcdn.avast.com/images/logo-avast-v1.png]<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
This email has been sent from a virus-free computer protected by Avast.
www.avast.com<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>

2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com<mailto:dep...@gmail.com>>:
Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C* only uses one core 
most of the time. nodetool compactionstats shows there's only one compactor 
running. I can see C* process only uses one core. So I guess I should've asked 
the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
2. Is there any configuration that affects single core compaction throughput?
3. Is concurrent_compactors the only option to parallelize compaction? If so, I 
guess it's the compaction strategy itself that decides when to parallelize and 
when to block on one core. Then there's not much we can do here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa 
<jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>> wrote:
With SSDs, the typical recommendation is up to 0.8-1 compactor per core 
(depending on other load).  How many CPU cores do you have?


From: Kai Wang
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Subject: compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction on my node. The node is 
CentOS 7 and has SSDs installed. The table is configured to use LCS. Here is my 
compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

nodetool compaction shows most of time there is one compaction. Sometimes there 
are 3-4 (I suppose this is controlled by concurrent_compactors). During the 
compaction, I see one CPU core is 100%. At that point, disk IO is about 20-25 
M/s write which is much lower than the disk is capable of. Even when there are 
4 compactions running, I see CPU go to +400% but disk IO is still at 20-25M/s 
write. I use nodetool setcompactionthroughput 0 to disable the compaction 
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there 
anyway to improve the throughput?

Thanks.




Re: compaction throughput

2016-01-21 Thread Kai Wang
I am using 2.2.4 and have seen multiple compactors running on the same
table. The number of compactors seems to be controlled by
concurrent_compactors. As of type of compactions, I've seen normal
compaction, tombstone compaction. Validation and Anticompaction seem to
always be single threaded.

On Thu, Jan 21, 2016 at 8:28 AM, PenguinWhispererThe . <
th3penguinwhispe...@gmail.com> wrote:

> Thanks for that clarification Sebastian! That's really good to know! I
> never took increasing this value in consideration because of my previous
> experience.
>
> In my case I had a table that was compacting over and over... and only one
> CPU was used. So that made me believe it was not multithreaded (I actually
> believe I asked this on IRC however it's been a few months ago so I might
> be wrong).
>
> Have there been behavioral changes on this lately? (I was using 2.0.9 or
> 2.0.11 I believe).
>
> 2016-01-21 14:15 GMT+01:00 Sebastian Estevez <
> sebastian.este...@datastax.com>:
>
>> >So compaction of one table will NOT spread over different cores.
>>
>> This is not exactly true. You actually can have multiple compactions
>> running at the same time on the same table, it just doesn't happen all that
>> often. You essentially would have to have two sets of sstables that are
>> both eligible for compactions at the same time.
>>
>> all the best,
>>
>> Sebastián
>> On Jan 21, 2016 7:41 AM, "PenguinWhispererThe ." <
>> th3penguinwhispe...@gmail.com> wrote:
>>
>>> After having some issues myself with compaction I think it's noteworthy
>>> to explicitly state that compaction of a table can only run on one CPU. So
>>> compaction of one table will NOT spread over different cores.
>>> To really have use of concurrent_compactors you need to have multiple
>>> table compactions initiated at the same time. If those are small they'll
>>> finish way earlier resulting in only one core using 100% as compaction is
>>> generally CPU bound (unless your disks can't keep up).
>>> I believe it's better to be CPU(core) bound on one core(or at least not
>>> all) for compaction than disk IO bound as this would result in writes and
>>> reads, ... having performance impact.
>>> Compaction is a maintenance task so it shouldn't be eating all your
>>> resources.
>>>
>>>
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>>  This
>>> email has been sent from a virus-free computer protected by Avast.
>>> www.avast.com
>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>> <#-2069969251_1162782367_-1582318301_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>
>>> 2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com>:
>>>
>>>> Jeff & Sebastian,
>>>>
>>>> Thanks for the reply. There are 12 cores but in my case C* only uses
>>>> one core most of the time. *nodetool compactionstats* shows there's
>>>> only one compactor running. I can see C* process only uses one core. So I
>>>> guess I should've asked the question more clearly:
>>>>
>>>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>>>> 2. Is there any configuration that affects single core compaction
>>>> throughput?
>>>> 3. Is concurrent_compactors the only option to parallelize compaction?
>>>> If so, I guess it's the compaction strategy itself that decides when to
>>>> parallelize and when to block on one core. Then there's not much we can do
>>>> here.
>>>>
>>>> Thanks.
>>>>
>>>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com
>>>> > wrote:
>>>>
>>>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per
>>>>> core (depending on other load).  How many CPU cores do you have?
>>>>>
>>>>>
>>>>> From: Kai Wang
>>>>> Reply-To: "user@cassandra.apache.org"
>>>>> Date: Friday, January 15, 2016 at 12:53 PM
>>>>> To: "user@cassandra.apache.org"
>>>>> Subject: compaction throughput
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to figure out the bottleneck of compaction on my node. The
>>>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>>>> LCS. Here is my compa

Re: compaction throughput

2016-01-21 Thread Sebastian Estevez
>So compaction of one table will NOT spread over different cores.

This is not exactly true. You actually can have multiple compactions
running at the same time on the same table, it just doesn't happen all that
often. You essentially would have to have two sets of sstables that are
both eligible for compactions at the same time.

all the best,

Sebastián
On Jan 21, 2016 7:41 AM, "PenguinWhispererThe ." <
th3penguinwhispe...@gmail.com> wrote:

> After having some issues myself with compaction I think it's noteworthy to
> explicitly state that compaction of a table can only run on one CPU. So
> compaction of one table will NOT spread over different cores.
> To really have use of concurrent_compactors you need to have multiple
> table compactions initiated at the same time. If those are small they'll
> finish way earlier resulting in only one core using 100% as compaction is
> generally CPU bound (unless your disks can't keep up).
> I believe it's better to be CPU(core) bound on one core(or at least not
> all) for compaction than disk IO bound as this would result in writes and
> reads, ... having performance impact.
> Compaction is a maintenance task so it shouldn't be eating all your
> resources.
>
>
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>  This
> email has been sent from a virus-free computer protected by Avast.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
> <#-1582318301_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> 2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com>:
>
>> Jeff & Sebastian,
>>
>> Thanks for the reply. There are 12 cores but in my case C* only uses one
>> core most of the time. *nodetool compactionstats* shows there's only one
>> compactor running. I can see C* process only uses one core. So I guess I
>> should've asked the question more clearly:
>>
>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>> 2. Is there any configuration that affects single core compaction
>> throughput?
>> 3. Is concurrent_compactors the only option to parallelize compaction? If
>> so, I guess it's the compaction strategy itself that decides when to
>> parallelize and when to block on one core. Then there's not much we can do
>> here.
>>
>> Thanks.
>>
>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
>>> (depending on other load).  How many CPU cores do you have?
>>>
>>>
>>> From: Kai Wang
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Friday, January 15, 2016 at 12:53 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: compaction throughput
>>>
>>> Hi,
>>>
>>> I am trying to figure out the bottleneck of compaction on my node. The
>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>> LCS. Here is my compaction related configs in cassandra.yaml:
>>>
>>> compaction_throughput_mb_per_sec: 160
>>> concurrent_compactors: 4
>>>
>>> I insert about 10G of data and start observing compaction.
>>>
>>> *nodetool compaction* shows most of time there is one compaction.
>>> Sometimes there are 3-4 (I suppose this is controlled by
>>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>>> At that point, disk IO is about 20-25 M/s write which is much lower than
>>> the disk is capable of. Even when there are 4 compactions running, I see
>>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>>> setcompactionthroughput 0* to disable the compaction throttle but don't
>>> see any difference.
>>>
>>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>>> there anyway to improve the throughput?
>>>
>>> Thanks.
>>>
>>
>>
>


Re: compaction throughput

2016-01-21 Thread Sebastian Estevez
@penguin There have been steady improvements in the different compaction
strategies recently but not major re-writes.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jan 21, 2016 at 9:12 AM, Kai Wang <dep...@gmail.com> wrote:

> I am using 2.2.4 and have seen multiple compactors running on the same
> table. The number of compactors seems to be controlled by
> concurrent_compactors. As of type of compactions, I've seen normal
> compaction, tombstone compaction. Validation and Anticompaction seem to
> always be single threaded.
>
> On Thu, Jan 21, 2016 at 8:28 AM, PenguinWhispererThe . <
> th3penguinwhispe...@gmail.com> wrote:
>
>> Thanks for that clarification Sebastian! That's really good to know! I
>> never took increasing this value in consideration because of my previous
>> experience.
>>
>> In my case I had a table that was compacting over and over... and only
>> one CPU was used. So that made me believe it was not multithreaded (I
>> actually believe I asked this on IRC however it's been a few months ago so
>> I might be wrong).
>>
>> Have there been behavioral changes on this lately? (I was using 2.0.9 or
>> 2.0.11 I believe).
>>
>> 2016-01-21 14:15 GMT+01:00 Sebastian Estevez <
>> sebastian.este...@datastax.com>:
>>
>>> >So compaction of one table will NOT spread over different cores.
>>>
>>> This is not exactly true. You actually can have multiple compactions
>>> running at the same time on the same table, it just doesn't happen all that
>>> often. You essentially would have to have two sets of sstables that are
>>> both eligible for compactions at the same time.
>>>
>>> all the best,
>>>
>>> Sebastián
>>> On Jan 21, 2016 7:41 AM, "PenguinWhispererThe ." <
>>> th3penguinwhispe...@gmail.com> wrote:
>>>
>>>> After having some issues myself with compaction I think it's noteworthy
>>>> to explicitly state that compaction of a table can only run on one CPU. So
>>>> compaction of one table will NOT spread over different cores.
>>>> To really have use of concurrent_compactors you need to have multiple
>>>> table compactions initiated at the same time. If those are small they'll
>>>> finish way earlier resulting in only one core using 100% as compaction is
>>>> generally CPU bound (unless your disks can't keep up).
>>>> I believe it's better to be CPU(core) bound on one core(or at least not
>>>> all) for compaction than disk IO bound as this would result in writes and
>>>> reads, ... having performance impact.
>>>> Compaction is a maintenance task so it shouldn't be eating all your
>>>> resources.
>>>>
>>>>
>>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>>>  This
>>>> email has been sent from a virus-free computer protected by Avast.
>>>> www.avast.com
>>>> <https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
>>>> <#-1919795192_-2069969251_1162782367_-1582318301_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>>
>>>> 2016-01-16 0:18 GMT+01:00 Kai Wang <dep...@gmail.com>:
>>>>
>>>>> Jeff & Sebastian,
>>>>>
>>>>> Thanks for the reply. There are 12 cores but in my case C* only uses
>>>>> one core most of the time. *nodetool compactionstats* shows there's
>>>>> only one compactor running. I can see C* process only uses one core. So I
>>>>> guess I should've asked the question more clearly:
>>>>>
>>

compaction throughput

2016-01-15 Thread Kai Wang
Hi,

I am trying to figure out the bottleneck of compaction on my node. The node
is CentOS 7 and has SSDs installed. The table is configured to use LCS.
Here is my compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

*nodetool compaction* shows most of time there is one compaction. Sometimes
there are 3-4 (I suppose this is controlled by concurrent_compactors).
During the compaction, I see one CPU core is 100%. At that point, disk IO
is about 20-25 M/s write which is much lower than the disk is capable of.
Even when there are 4 compactions running, I see CPU go to +400% but disk
IO is still at 20-25M/s write. I use *nodetool setcompactionthroughput 0*
to disable the compaction throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there
anyway to improve the throughput?

Thanks.


Re: compaction throughput

2016-01-15 Thread Kai Wang
I forget to mention I am using C* 2.2.4
On Jan 15, 2016 3:53 PM, "Kai Wang"  wrote:

> Hi,
>
> I am trying to figure out the bottleneck of compaction on my node. The
> node is CentOS 7 and has SSDs installed. The table is configured to use
> LCS. Here is my compaction related configs in cassandra.yaml:
>
> compaction_throughput_mb_per_sec: 160
> concurrent_compactors: 4
>
> I insert about 10G of data and start observing compaction.
>
> *nodetool compaction* shows most of time there is one compaction.
> Sometimes there are 3-4 (I suppose this is controlled by
> concurrent_compactors). During the compaction, I see one CPU core is 100%.
> At that point, disk IO is about 20-25 M/s write which is much lower than
> the disk is capable of. Even when there are 4 compactions running, I see
> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
> setcompactionthroughput 0* to disable the compaction throttle but don't
> see any difference.
>
> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there
> anyway to improve the throughput?
>
> Thanks.
>


Re: compaction throughput

2016-01-15 Thread Jeff Ferland
Compaction is generally CPU bound and relatively slow. Exactly why that is I’m 
uncertain.

> On Jan 15, 2016, at 12:53 PM, Kai Wang  wrote:
> 
> Hi,
> 
> I am trying to figure out the bottleneck of compaction on my node. The node 
> is CentOS 7 and has SSDs installed. The table is configured to use LCS. Here 
> is my compaction related configs in cassandra.yaml:
> 
> compaction_throughput_mb_per_sec: 160
> concurrent_compactors: 4
> 
> I insert about 10G of data and start observing compaction.
> 
> nodetool compaction shows most of time there is one compaction. Sometimes 
> there are 3-4 (I suppose this is controlled by concurrent_compactors). During 
> the compaction, I see one CPU core is 100%. At that point, disk IO is about 
> 20-25 M/s write which is much lower than the disk is capable of. Even when 
> there are 4 compactions running, I see CPU go to +400% but disk IO is still 
> at 20-25M/s write. I use nodetool setcompactionthroughput 0 to disable the 
> compaction throttle but don't see any difference.
> 
> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there 
> anyway to improve the throughput?
> 
> Thanks.



Re: compaction throughput

2016-01-15 Thread Sebastian Estevez
 *nodetool setcompactionthroughput 0*

Will only affect future compactions, not the ones that are currently
running.

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 15, 2016 at 4:40 PM, Jeff Ferland  wrote:

> Compaction is generally CPU bound and relatively slow. Exactly why that is
> I’m uncertain.
>
> On Jan 15, 2016, at 12:53 PM, Kai Wang  wrote:
>
> Hi,
>
> I am trying to figure out the bottleneck of compaction on my node. The
> node is CentOS 7 and has SSDs installed. The table is configured to use
> LCS. Here is my compaction related configs in cassandra.yaml:
>
> compaction_throughput_mb_per_sec: 160
> concurrent_compactors: 4
>
> I insert about 10G of data and start observing compaction.
>
> *nodetool compaction* shows most of time there is one compaction.
> Sometimes there are 3-4 (I suppose this is controlled by
> concurrent_compactors). During the compaction, I see one CPU core is 100%.
> At that point, disk IO is about 20-25 M/s write which is much lower than
> the disk is capable of. Even when there are 4 compactions running, I see
> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
> setcompactionthroughput 0* to disable the compaction throttle but don't
> see any difference.
>
> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there
> anyway to improve the throughput?
>
> Thanks.
>
>
>


Re: compaction throughput

2016-01-15 Thread Jeff Jirsa
With SSDs, the typical recommendation is up to 0.8-1 compactor per core 
(depending on other load).  How many CPU cores do you have?


From:  Kai Wang
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, January 15, 2016 at 12:53 PM
To:  "user@cassandra.apache.org"
Subject:  compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction on my node. The node is 
CentOS 7 and has SSDs installed. The table is configured to use LCS. Here is my 
compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

nodetool compaction shows most of time there is one compaction. Sometimes there 
are 3-4 (I suppose this is controlled by concurrent_compactors). During the 
compaction, I see one CPU core is 100%. At that point, disk IO is about 20-25 
M/s write which is much lower than the disk is capable of. Even when there are 
4 compactions running, I see CPU go to +400% but disk IO is still at 20-25M/s 
write. I use nodetool setcompactionthroughput 0 to disable the compaction 
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there 
anyway to improve the throughput?

Thanks.



smime.p7s
Description: S/MIME cryptographic signature


Re: compaction throughput

2016-01-15 Thread Kai Wang
Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C* only uses one
core most of the time. *nodetool compactionstats* shows there's only one
compactor running. I can see C* process only uses one core. So I guess I
should've asked the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
2. Is there any configuration that affects single core compaction
throughput?
3. Is concurrent_compactors the only option to parallelize compaction? If
so, I guess it's the compaction strategy itself that decides when to
parallelize and when to block on one core. Then there's not much we can do
here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
> (depending on other load).  How many CPU cores do you have?
>
>
> From: Kai Wang
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, January 15, 2016 at 12:53 PM
> To: "user@cassandra.apache.org"
> Subject: compaction throughput
>
> Hi,
>
> I am trying to figure out the bottleneck of compaction on my node. The
> node is CentOS 7 and has SSDs installed. The table is configured to use
> LCS. Here is my compaction related configs in cassandra.yaml:
>
> compaction_throughput_mb_per_sec: 160
> concurrent_compactors: 4
>
> I insert about 10G of data and start observing compaction.
>
> *nodetool compaction* shows most of time there is one compaction.
> Sometimes there are 3-4 (I suppose this is controlled by
> concurrent_compactors). During the compaction, I see one CPU core is 100%.
> At that point, disk IO is about 20-25 M/s write which is much lower than
> the disk is capable of. Even when there are 4 compactions running, I see
> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
> setcompactionthroughput 0* to disable the compaction throttle but don't
> see any difference.
>
> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there
> anyway to improve the throughput?
>
> Thanks.
>


Re: compaction throughput

2016-01-15 Thread Sebastian Estevez
Correct.

Why are you concerned with the raw throughput, are you accumulating pending
compactions? Are you seeing high sstables per read statistics?

all the best,

Sebastián
On Jan 15, 2016 6:18 PM, "Kai Wang" <dep...@gmail.com> wrote:

> Jeff & Sebastian,
>
> Thanks for the reply. There are 12 cores but in my case C* only uses one
> core most of the time. *nodetool compactionstats* shows there's only one
> compactor running. I can see C* process only uses one core. So I guess I
> should've asked the question more clearly:
>
> 1. Is ~25 M/s a reasonable compaction throughput for one core?
> 2. Is there any configuration that affects single core compaction
> throughput?
> 3. Is concurrent_compactors the only option to parallelize compaction? If
> so, I guess it's the compaction strategy itself that decides when to
> parallelize and when to block on one core. Then there's not much we can do
> here.
>
> Thanks.
>
> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
>> (depending on other load).  How many CPU cores do you have?
>>
>>
>> From: Kai Wang
>> Reply-To: "user@cassandra.apache.org"
>> Date: Friday, January 15, 2016 at 12:53 PM
>> To: "user@cassandra.apache.org"
>> Subject: compaction throughput
>>
>> Hi,
>>
>> I am trying to figure out the bottleneck of compaction on my node. The
>> node is CentOS 7 and has SSDs installed. The table is configured to use
>> LCS. Here is my compaction related configs in cassandra.yaml:
>>
>> compaction_throughput_mb_per_sec: 160
>> concurrent_compactors: 4
>>
>> I insert about 10G of data and start observing compaction.
>>
>> *nodetool compaction* shows most of time there is one compaction.
>> Sometimes there are 3-4 (I suppose this is controlled by
>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>> At that point, disk IO is about 20-25 M/s write which is much lower than
>> the disk is capable of. Even when there are 4 compactions running, I see
>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>> setcompactionthroughput 0* to disable the compaction throttle but don't
>> see any difference.
>>
>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>> there anyway to improve the throughput?
>>
>> Thanks.
>>
>
>


Re: compaction throughput

2016-01-15 Thread Sebastian Estevez
LCS is IO ontensive but CPU is also relevant.

On slower disks compaction may not be cpu bound.

If you aren't seeing more than one compaction thread at a time, I suspect
your system is not compaction bound.

all the best,

Sebastián
On Jan 15, 2016 7:20 PM, "Kai Wang" <dep...@gmail.com> wrote:

> Sebastian,
>
> Because I have this impression that LCS is IO intensive and it's
> recommended only on SSDs. So I am curious to see how far it can stress
> those SSDs. But it turns out the most expensive part about LCS is not IO
> bound but CUP bound, or more precisely single core speed bound. This is a
> little surprising.
>
> Of course LCS is still superior in other aspects.
> On Jan 15, 2016 6:34 PM, "Sebastian Estevez" <
> sebastian.este...@datastax.com> wrote:
>
>> Correct.
>>
>> Why are you concerned with the raw throughput, are you accumulating
>> pending compactions? Are you seeing high sstables per read statistics?
>>
>> all the best,
>>
>> Sebastián
>> On Jan 15, 2016 6:18 PM, "Kai Wang" <dep...@gmail.com> wrote:
>>
>>> Jeff & Sebastian,
>>>
>>> Thanks for the reply. There are 12 cores but in my case C* only uses one
>>> core most of the time. *nodetool compactionstats* shows there's only
>>> one compactor running. I can see C* process only uses one core. So I guess
>>> I should've asked the question more clearly:
>>>
>>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>>> 2. Is there any configuration that affects single core compaction
>>> throughput?
>>> 3. Is concurrent_compactors the only option to parallelize compaction?
>>> If so, I guess it's the compaction strategy itself that decides when to
>>> parallelize and when to block on one core. Then there's not much we can do
>>> here.
>>>
>>> Thanks.
>>>
>>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>>> wrote:
>>>
>>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
>>>> (depending on other load).  How many CPU cores do you have?
>>>>
>>>>
>>>> From: Kai Wang
>>>> Reply-To: "user@cassandra.apache.org"
>>>> Date: Friday, January 15, 2016 at 12:53 PM
>>>> To: "user@cassandra.apache.org"
>>>> Subject: compaction throughput
>>>>
>>>> Hi,
>>>>
>>>> I am trying to figure out the bottleneck of compaction on my node. The
>>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>>> LCS. Here is my compaction related configs in cassandra.yaml:
>>>>
>>>> compaction_throughput_mb_per_sec: 160
>>>> concurrent_compactors: 4
>>>>
>>>> I insert about 10G of data and start observing compaction.
>>>>
>>>> *nodetool compaction* shows most of time there is one compaction.
>>>> Sometimes there are 3-4 (I suppose this is controlled by
>>>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>>>> At that point, disk IO is about 20-25 M/s write which is much lower than
>>>> the disk is capable of. Even when there are 4 compactions running, I see
>>>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>>>> setcompactionthroughput 0* to disable the compaction throttle but
>>>> don't see any difference.
>>>>
>>>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>>>> there anyway to improve the throughput?
>>>>
>>>> Thanks.
>>>>
>>>
>>>


Re: compaction throughput

2016-01-15 Thread Kai Wang
Sebastian,

Because I have this impression that LCS is IO intensive and it's
recommended only on SSDs. So I am curious to see how far it can stress
those SSDs. But it turns out the most expensive part about LCS is not IO
bound but CUP bound, or more precisely single core speed bound. This is a
little surprising.

Of course LCS is still superior in other aspects.
On Jan 15, 2016 6:34 PM, "Sebastian Estevez" <sebastian.este...@datastax.com>
wrote:

> Correct.
>
> Why are you concerned with the raw throughput, are you accumulating
> pending compactions? Are you seeing high sstables per read statistics?
>
> all the best,
>
> Sebastián
> On Jan 15, 2016 6:18 PM, "Kai Wang" <dep...@gmail.com> wrote:
>
>> Jeff & Sebastian,
>>
>> Thanks for the reply. There are 12 cores but in my case C* only uses one
>> core most of the time. *nodetool compactionstats* shows there's only one
>> compactor running. I can see C* process only uses one core. So I guess I
>> should've asked the question more clearly:
>>
>> 1. Is ~25 M/s a reasonable compaction throughput for one core?
>> 2. Is there any configuration that affects single core compaction
>> throughput?
>> 3. Is concurrent_compactors the only option to parallelize compaction? If
>> so, I guess it's the compaction strategy itself that decides when to
>> parallelize and when to block on one core. Then there's not much we can do
>> here.
>>
>> Thanks.
>>
>> On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> With SSDs, the typical recommendation is up to 0.8-1 compactor per core
>>> (depending on other load).  How many CPU cores do you have?
>>>
>>>
>>> From: Kai Wang
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Friday, January 15, 2016 at 12:53 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: compaction throughput
>>>
>>> Hi,
>>>
>>> I am trying to figure out the bottleneck of compaction on my node. The
>>> node is CentOS 7 and has SSDs installed. The table is configured to use
>>> LCS. Here is my compaction related configs in cassandra.yaml:
>>>
>>> compaction_throughput_mb_per_sec: 160
>>> concurrent_compactors: 4
>>>
>>> I insert about 10G of data and start observing compaction.
>>>
>>> *nodetool compaction* shows most of time there is one compaction.
>>> Sometimes there are 3-4 (I suppose this is controlled by
>>> concurrent_compactors). During the compaction, I see one CPU core is 100%.
>>> At that point, disk IO is about 20-25 M/s write which is much lower than
>>> the disk is capable of. Even when there are 4 compactions running, I see
>>> CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool
>>> setcompactionthroughput 0* to disable the compaction throttle but don't
>>> see any difference.
>>>
>>> Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is
>>> there anyway to improve the throughput?
>>>
>>> Thanks.
>>>
>>
>>


RE: compaction throughput rate not even close to 16MB

2013-04-25 Thread Viktor Jevdokimov
Our experience with compactions shows that more columns to merge for the same 
row, more CPU it takes.

For example, testing and choosing between 2 data models with supercolumns (we 
still need supercolumns since composite columns lacks some functionality):
  1. supercolumns with many columns
  2.  supercolumns with one column (columns from model 1 merged to one blob 
value)
We found that model 2 compaction performs 4 times faster.

The same for regular column families.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
 Sent: Wednesday, April 24, 2013 23:38
 To: user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 Thanks much!!!  Better to hear at least one other person sees the same thing
 ;).  Sometimes these posts just go silent.

 Dean

 From: Edward Capriolo
 edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 Reply-To:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 24, 2013 2:33 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 I have noticed the same. I think in the real world your compaction
 throughput is limited by other things. If I had to speculate I would say that
 compaction can remove expired tombstones, however doing this requires
 bloom filter checks, etc.

 I think that setting is more important with multi threaded compaction and/or
 more compaction slots. In those cases it may actually throttle something.


 On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean
 dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
 I was wondering about the compactionthroughput.  I never see ours get
 even close to 16MB and I thought this is supposed to throttle compaction,
 right?  Ours is constantly less than 3MB/sec from looking at our logs or do I
 have this totally wrong?  How can I see the real throughput so that I can
 understand how to throttle it when I need to?

 94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms =
 2.365603MB/s.  2,350,114 total rows, 2,350,022 unique.  Row merge counts
 were {1:2349930, 2:92, }

 Thanks,
 Dean






compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean





Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Edward Capriolo
I have noticed the same. I think in the real world your compaction
throughput is limited by other things. If I had to speculate I would say
that compaction can remove expired tombstones, however doing this requires
bloom filter checks, etc.

I think that setting is more important with multi threaded compaction
and/or more compaction slots. In those cases it may actually throttle
something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I was wondering about the compactionthroughput.  I never see ours get even
 close to 16MB and I thought this is supposed to throttle compaction, right?
  Ours is constantly less than 3MB/sec from looking at our logs or do I have
 this totally wrong?  How can I see the real throughput so that I can
 understand how to throttle it when I need to?

 94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms =
 2.365603MB/s.  2,350,114 total rows, 2,350,022 unique.  Row merge counts
 were {1:2349930, 2:92, }

 Thanks,
 Dean






Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Robert Coli
On Wed, Apr 24, 2013 at 1:33 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I think that setting is more important with multi threaded compaction and/or
 more compaction slots. In those cases it may actually throttle something.

Or if you're simultaneously doing a repair, which does a validation
compaction, which will (should?) also be subject to the throttle?

=Rob


Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Hiller, Dean
Thanks much!!!  Better to hear at least one other person sees the same thing 
;).  Sometimes these posts just go silent.

Dean

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 24, 2013 2:33 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: compaction throughput rate not even close to 16MB

I have noticed the same. I think in the real world your compaction throughput 
is limited by other things. If I had to speculate I would say that compaction 
can remove expired tombstones, however doing this requires bloom filter checks, 
etc.

I think that setting is more important with multi threaded compaction and/or 
more compaction slots. In those cases it may actually throttle something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean






Re: compaction throughput rate not even close to 16MB

2013-04-24 Thread Wei Zhu
Same here. We disable the throttling and our disk and CPU usage both low ( 
10%) and still takes hours for LCS compaction to finish after a repair. For 
this cluster, we don't delete any data, so we can rule out tombstones. Not sure 
what is holding compaction back. My observation is that for the LCS which 
involves large number of SSTables (since we set SSTable size too small at 10M 
and sometimes one compactions involves up to 10 G of data = 1000 SSTables), the 
throughout put is smaller. So my theory is that open/close file handlers have 
substantial impact on the throughput. 

By the way, we are on SSD.

-Wei


 From: Hiller, Dean dean.hil...@nrel.gov
To: user@cassandra.apache.org user@cassandra.apache.org 
Sent: Wednesday, April 24, 2013 1:37 PM
Subject: Re: compaction throughput rate not even close to 16MB
 

Thanks much!!!  Better to hear at least one other person sees the same thing 
;).  Sometimes these posts just go silent.

Dean

From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 24, 2013 2:33 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: compaction throughput rate not even close to 16MB

I have noticed the same. I think in the real world your compaction throughput 
is limited by other things. If I had to speculate I would say that compaction 
can remove expired tombstones, however doing this requires bloom filter checks, 
etc.

I think that setting is more important with multi threaded compaction and/or 
more compaction slots. In those cases it may actually throttle something.


On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
I was wondering about the compactionthroughput.  I never see ours get even 
close to 16MB and I thought this is supposed to throttle compaction, right?  
Ours is constantly less than 3MB/sec from looking at our logs or do I have this 
totally wrong?  How can I see the real throughput so that I can understand how 
to throttle it when I need to?

94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms = 2.365603MB/s.  
2,350,114 total rows, 2,350,022 unique.  Row merge counts were {1:2349930, 
2:92, }

Thanks,
Dean