Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock
As to why I think it's cluster-wide, here's what the documentation says:

https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
(Default: 16 ) Throttles compaction to the specified total throughput
across the entire system. The faster you insert data, the faster you need
to compact in order to keep the SSTable count down. The recommended Value
is 16 to 32 times the rate of write throughput (in MBs/second). Setting the
value to 0 disables compaction throttling. Perhaps "across the entire
system" means "across all keyspaces for this Cassandra node"?

Compare the above documentation with the subsequent one which specifically
calls out "a node":

concurrent_compactors
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__concurrent_compactors>
(Default: 1 per CPU core**) Sets the number of concurrent compaction
processes allowed to run simultaneously on a node, not including validation
compactions for anti-entropy repair. Simultaneous compactions help preserve
read performance in a mixed read-write workload by mitigating the tendency
of small SSTables to accumulate during a single long-running compaction. If
compactions run too slowly or too fast, change
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
first. I always thought it was per-node and I'm guessing this is a
documentation lack of clarity issue.

On Mon, Jan 4, 2016 at 5:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Why do you think it’s cluster wide? That param is per-node, and you can
> change it at runtime with nodetool (or via the JMX interface using jconsole
> to ip:7199 )
>
>
>
> From: Ken Hancock
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, January 4, 2016 at 12:59 PM
> To: "user@cassandra.apache.org"
> Subject: compaction_throughput_mb_per_sec
>
> I was surprised the other day to discover that this was a cluster-wide
> setting.   Why does that make sense?
>
> In a heterogeneous cassandra deployment, say I have some old servers
> running spinning disks and I'm bringing on more nodes that perhaps utilize
> SSD.  I want to have different compaction throttling  on different nodes to
> minimize read impact times.
>
> I can already balance data ownership through either token allocation or
> vnode counts.
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
>


Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Robert Coli
On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock  wrote:

> As to why I think it's cluster-wide, here's what the documentation says:
>

Do you see "system" used in place of "cluster" anywhere else in the docs?

I think you are correct that the docs should standardize on "system"
instead of "node", because node to me includes vnodes. "system" or "host"
is what I think of as "the entire cassandra process".

If I were you, I'd email docs AT datastaxdotcom with your feedback. :D

=Rob


Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock
Will do.  I searched the doc for additional usage of the term "system"

commitlog_segment_size_in_mb refers to "every table in the system"
concurrent_writes talks about CPU cores "in your system"

That's it for "system" other than the compaction_throughput_mb_per_sec
which refers to "across the entire system".

node is the predominant term in the yaml configuration, though I can
certainly see potential confusion with vnodes.



On Tue, Jan 5, 2016 at 2:26 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock <ken.hanc...@schange.com>
> wrote:
>
>> As to why I think it's cluster-wide, here's what the documentation says:
>>
>
> Do you see "system" used in place of "cluster" anywhere else in the docs?
>
> I think you are correct that the docs should standardize on "system"
> instead of "node", because node to me includes vnodes. "system" or "host"
> is what I think of as "the entire cassandra process".
>
> If I were you, I'd email docs AT datastaxdotcom with your feedback. :D
>
> =Rob
>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
<http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Jack Krupansky
I forwarded a comment to the docs team.

It appears that they picked the language up from the cassandra.yaml file
itself. Looking at use of system in that file, it seems that it usually
means the node, the box running the node.

-- Jack Krupansky

On Tue, Jan 5, 2016 at 9:50 AM, Ken Hancock <ken.hanc...@schange.com> wrote:

> As to why I think it's cluster-wide, here's what the documentation says:
>
>
> https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html
> compaction_throughput_mb_per_sec
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
> (Default: 16 ) Throttles compaction to the specified total throughput
> across the entire system. The faster you insert data, the faster you need
> to compact in order to keep the SSTable count down. The recommended Value
> is 16 to 32 times the rate of write throughput (in MBs/second). Setting the
> value to 0 disables compaction throttling. Perhaps "across the entire
> system" means "across all keyspaces for this Cassandra node"?
>
> Compare the above documentation with the subsequent one which specifically
> calls out "a node":
>
> concurrent_compactors
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__concurrent_compactors>
> (Default: 1 per CPU core**) Sets the number of concurrent compaction
> processes allowed to run simultaneously on a node, not including validation
> compactions for anti-entropy repair. Simultaneous compactions help preserve
> read performance in a mixed read-write workload by mitigating the tendency
> of small SSTables to accumulate during a single long-running compaction. If
> compactions run too slowly or too fast, change
> compaction_throughput_mb_per_sec
> <https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
> first. I always thought it was per-node and I'm guessing this is a
> documentation lack of clarity issue.
>
> On Mon, Jan 4, 2016 at 5:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
>> Why do you think it’s cluster wide? That param is per-node, and you can
>> change it at runtime with nodetool (or via the JMX interface using jconsole
>> to ip:7199 )
>>
>>
>>
>> From: Ken Hancock
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, January 4, 2016 at 12:59 PM
>> To: "user@cassandra.apache.org"
>> Subject: compaction_throughput_mb_per_sec
>>
>> I was surprised the other day to discover that this was a cluster-wide
>> setting.   Why does that make sense?
>>
>> In a heterogeneous cassandra deployment, say I have some old servers
>> running spinning disks and I'm bringing on more nodes that perhaps utilize
>> SSD.  I want to have different compaction throttling  on different nodes to
>> minimize read impact times.
>>
>> I can already balance data ownership through either token allocation or
>> vnode counts.
>>
>> Also, as I increase my node count, I technically also have to increase my
>> compaction_throughput which would require a rolling restart across the
>> cluster.
>>
>>
>>
>
>
>


Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Carl Yeksigian
This is set in the cassandra.yaml on each node independently; it doesn't
have to be same cluster-wide.

On Mon, Jan 4, 2016 at 3:59 PM, Ken Hancock  wrote:

> I was surprised the other day to discover that this was a cluster-wide
> setting.   Why does that make sense?
>
> In a heterogeneous cassandra deployment, say I have some old servers
> running spinning disks and I'm bringing on more nodes that perhaps utilize
> SSD.  I want to have different compaction throttling  on different nodes to
> minimize read impact times.
>
> I can already balance data ownership through either token allocation or
> vnode counts.
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
>


Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Jeff Jirsa
Why do you think it’s cluster wide? That param is per-node, and you can change 
it at runtime with nodetool (or via the JMX interface using jconsole to ip:7199 
)



From:  Ken Hancock
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, January 4, 2016 at 12:59 PM
To:  "user@cassandra.apache.org"
Subject:  compaction_throughput_mb_per_sec

I was surprised the other day to discover that this was a cluster-wide setting. 
  Why does that make sense?

In a heterogeneous cassandra deployment, say I have some old servers running 
spinning disks and I'm bringing on more nodes that perhaps utilize SSD.  I want 
to have different compaction throttling  on different nodes to minimize read 
impact times.

I can already balance data ownership through either token allocation or vnode 
counts. 

Also, as I increase my node count, I technically also have to increase my 
compaction_throughput which would require a rolling restart across the cluster.





smime.p7s
Description: S/MIME cryptographic signature


Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Nate McCall
>
>> Also, as I increase my node count, I technically also have to increase my
>> compaction_throughput which would require a rolling restart across the
>> cluster.
>>
>>
> You can set compaction throughput on each node dynamically via nodetool
> setcompactionthroughput.
>
>
>
Also, the IOPS generated by your worklaod and the efficiency of the JVM
with such are what should drive compaction throughput settings. Raw node
count is orthogonal.


Re: compaction_throughput_mb_per_sec

2016-01-04 Thread Nate McCall
>
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
You can set compaction throughput on each node dynamically via nodetool
setcompactionthroughput.


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com