Re: Is my cluster normal?

Jeff Jirsa Thu, 07 Jul 2016 19:18:44 -0700

EBS iops scale with volume size.


A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those on 
writes, you’re going to suffer on reads.

 

You have a 16G server, and probably a good chunk of that allocated to heap. 
Consequently, you have almost no page cache, so your reads are going to hit the 
disk. Your reads being very low is not uncommon if you have no page cache – the 
default settings for Cassandra (64k compression chunks) are really inefficient 
for small reads served off of disk. If you drop the compression chunk size (4k, 
for example), you’ll probably see your read throughput increase significantly, 
which will give you more iops for commitlog, so write throughput likely goes 
up, too.

 

 

 

From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, July 7, 2016 at 6:54 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Is my cluster normal?

 

What's your CPU looking like? If it's low, check your IO with iostat or dstat. 
I know some people have used Ebs and say it's fine but ive been burned too many 
times. 

On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <y...@kryptoncloud.com> wrote:

Hi Riccardo, 

 

Very low IO-wait. About 0.3%.

No stolen CPU. It is a casssandra only instance. I did not see any dropped 
messages.

 

 

ubuntu@cassandra1:/mnt/data$ nodetool tpstats

Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked

MutationStage                     1         1      929509244         0          
       0

ViewMutationStage                 0         0              0         0          
       0

ReadStage                         4         0        4021570         0          
       0

RequestResponseStage              0         0      731477999         0          
       0

ReadRepairStage                   0         0         165603         0          
       0

CounterMutationStage              0         0              0         0          
       0

MiscStage                         0         0              0         0          
       0

CompactionExecutor                2        55          92022         0          
       0

MemtableReclaimMemory             0         0           1736         0          
       0

PendingRangeCalculator            0         0              6         0          
       0

GossipStage                       0         0         345474         0          
       0

SecondaryIndexManagement          0         0              0         0          
       0

HintsDispatcher                   0         0              4         0          
       0

MigrationStage                    0         0             35         0          
       0

MemtablePostFlush                 0         0           1973         0          
       0

ValidationExecutor                0         0              0         0          
       0

Sampler                           0         0              0         0          
       0

MemtableFlushWriter               0         0           1736         0          
       0

InternalResponseStage             0         0           5311         0          
       0

AntiEntropyStage                  0         0              0         0          
       0

CacheCleanupExecutor              0         0              0         0          
       0

Native-Transport-Requests       128       128      347508531         2          
15891862

 

Message type           Dropped

READ                         0

RANGE_SLICE                  0

_TRACE                       0

HINT                         0

MUTATION                     0

COUNTER_MUTATION             0

BATCH_STORE                  0

BATCH_REMOVE                 0

REQUEST_RESPONSE             0

PAGED_RANGE                  0

READ_REPAIR                  0

 

 

 

 

 

On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <ferra...@gmail.com> wrote:

Hi Yuan, 

 

You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside from 
any Cassandra specific discussion a system load of 10 on a 4 threads machine is 
way too much in my opinion. If that is the running average system load I would 
look deeper into system details. Is that IO wait? Is that CPU Stolen? Is that a 
Cassandra only instance or are there other processes pushing the load?

What does your "nodetool tpstats" say? Hoe many dropped messages do you have?

 

Best,

 

On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <y...@kryptoncloud.com> wrote:

Thanks Ben! For the post, it seems they got a little better but similar result 
than i did. Good to know it. 

I am not sure if a little fine tuning of heap memory will help or not.  

 

 

On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <ben.sla...@instaclustr.com> wrote:

Hi Yuan, 

 

You might find this blog post a useful comparison:

https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/

 

Although the focus is on Spark and Cassandra and multi-DC there are also some 
single DC benchmarks of m4.xl clusters plus some discussion of how we went 
about benchmarking.

 

Cheers

Ben

 

 

On Fri, 8 Jul 2016 at 07:52 Yuan Fang <y...@kryptoncloud.com> wrote:

Yes, here is my stress test result: 

Results:

op rate                   : 12200 [WRITE:12200]

partition rate            : 12200 [WRITE:12200]

row rate                  : 12200 [WRITE:12200]

latency mean              : 16.4 [WRITE:16.4]

latency median            : 7.1 [WRITE:7.1]

latency 95th percentile   : 38.1 [WRITE:38.1]

latency 99th percentile   : 204.3 [WRITE:204.3]

latency 99.9th percentile : 465.9 [WRITE:465.9]

latency max               : 1408.4 [WRITE:1408.4]

Total partitions          : 1000000 [WRITE:1000000]

Total errors              : 0 [WRITE:0]

total gc count            : 0

total gc mb               : 0

total gc time (s)         : 0

avg gc time(ms)           : NaN

stdev gc time(ms)         : 0

Total operation time      : 00:01:21

END

 

On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <r...@foundev.pro> wrote:

Lots of variables you're leaving out.

 

Depends on write size, if you're using logged batch or not, what consistency 
level, what RF, if the writes come in bursts, etc, etc. However, that's all 
sort of moot for determining "normal" really you need a baseline as all those 
variables end up mattering a huge amount.

 

I would suggest using Cassandra stress as a baseline and go from there 
depending on what those numbers say (just pick the defaults).

Sent from my iPhone


On Jul 7, 2016, at 4:39 PM, Yuan Fang <y...@kryptoncloud.com> wrote:

yes, it is about 8k writes per node. 

 

 

 

On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daeme...@gmail.com> wrote:

Are you saying 7k writes per node? or 30k writes per node?



.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

 

On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <y...@kryptoncloud.com> wrote:

writes 30k/second is the main thing. 

 

 

On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daeme...@gmail.com> wrote:

Assuming you meant 100k, that likely for something with 16mb of storage 
(probably way small) where the data is more that 64k hence will not fit into 
the row cache.



.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

 

On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <y...@kryptoncloud.com> wrote:

 

I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB ssd 
EBS). 

I can reach a cluster wide write requests of 30k/second and read request about 
100/second. The cluster OS load constantly above 10. Are those normal?

 

Thanks!

 

 

Best,

 

Yuan 

 

 

 

 

 

 

-- 

———————— 

Ben Slater 

Chief Product Officer

Instaclustr: Cassandra + Spark - Managed | Consulting | Support

+61 437 929 798

smime.p7s
Description: S/MIME cryptographic signature

Re: Is my cluster normal?

Reply via email to