Re: hardware sizing for cassandra

Russell Bradberry Tue, 09 Sep 2014 14:14:31 -0700

*TL;DR*

There is no one recommended setup for Cassandra, everyone's use-case is
different and it is up to you to figure out the best setup for your
use-case. There are a lot of questions that need to be asked before making
a decision on hardware layout.

------------------------

There is just so much wrong with this, it's hard to find where to begin. It
seems you are giving advice based on a specific use-case and not even one
that would necessarily match the case of the original poster.

Every node should have at least 4 cores

While more cores the better, there is no "minimum" recommended number of
cores. It depends solely on your use-case. We ran dual core machines in
production successfully for over a year.

with a maximum of 8

No. The more cores the better, period.  Especially if you are running SSD.
 I have yet to see a machine max out IOPS on an SSD drive, the context
switches end up becoming the bottleneck here.

Memory shouldn't be higher than 32g

I have no idea where you are getting this.  I have seen people running
nodes with 256GB of ram, simply because they want as much or all of their
data to be memory resident.  If you are running spindles you are lucky to
get 50ms response times on average, if it is memory resident you can get as
low as 2-5ms response times.

16gb is good for a start

16GB is recommended simply because the max JVM heap you want to run is 8GB
and this leaves 8GB for other system and Cassandra related things.  You
could go as low as 12 if you have a low workload and nothing else on the
machine.

Every node should be a phisical machine, not a virtual one, or at least a
> virtual machine with an ssd hd subsystem.

I have no idea where you are getting this. Cassandra has no problem running
on VM's as long as the disks aren't shared.  Many, people including
Netflix, which has one of the largest Cassandra deployments, are running in
Amazon which is VM.

The disk subsystem should be directly connected to the machine, no sans or
> fiber channel between.

Again, this is highly dependent on hardware. If you are running iSCSI and
your disks are dedicated and you have enough available IOPS to spare, you
will be fine.  I recently benchmarked 1TB EBS SSD drives in Amazon against
Ephemeral SSD drives with little noticeable difference.

 Cassandra is cpu and io bounded, so you should get the maximum io speed
> and a reasonable number of cores.

Cassandra is I/O bound only when running spindle disks, it becomes
context-switch bound when running SSD drives. I have tried many times to
max out the SSD IOPS without ever becoming able to.

Number of nodes should be 3 at least with replication factor of 2.

The recommended setup is 3 nodes and an RF of 3 to be able to make quorum
reads/writes and survive an outage. But again, this is completely use-case
dependent.

 You should prefer more less powerful nodes then fewer more powerful nodes.

Finally, something we can agree on

 Disk size depends on your workload, although you should always keep 50% of
> the disk free in the case repair sessions requires space, or perform sub
> range repairs.

Only if you are running STCS, if your are running LCS you can use much more
than 50% of your available space.

On Tue, Sep 9, 2014 at 12:52 PM, Paolo Crosato <
paolo.cros...@targaubiest.com> wrote:

> Every node should have at least 4 cores, with a maximum of 8. Memory
> shouldn't be higher than 32g, 16gb is good for a start. Every node should
> be a phisical machine, not a virtual one, or at least a virtual machine
> with an ssd hd subsystem. The disk subsystem should be directly connected
> to the machine, no sans or fiber channel between. Cassandra is cpu and io
> bounded, so you should get the maximum io speed and a reasonable number of
> cores.
>
> Number of nodes should be 3 at least with replication factor of 2. You
> should prefer more less powerful nodes then fewer more powerful nodes.
>
> Disk size depends on your workload, although you should always keep 50% of
> the disk free in the case repair sessions requires space, or perform sub
> range repairs.
>
> In my experience a 1GB link between nodes is ok, but the less lag the
> better.
>
> Summing up if you need to save some money, get 4 cores and 16 gb or ram,
> 32 is rarely needed and 64 a waste. 8 cores would probably be too much with
> 1000 writes a second.
>
> Paolo
>
>
>
> ........................................................................................
> Paolo Crosato
> Software engineer/Custom Solutions
>
>
> ________________________________________
> Da: Chris Lohfink <clohf...@blackbirdit.com>
> Inviato: martedì 9 settembre 2014 21.26
> A: user@cassandra.apache.org
> Oggetto: Re: hardware sizing for cassandra
>
> It depends.  Ultimately your load is low enough a single node can probably
> handle it so you kinda want a "minimum" cluster.  Different people have
> different thoughts on what this means - I would recommend 5-6 nodes with a
> 3 replication factor.  (say m1.xlarge, or c3.2xlarge striped ephemerals, I
> like i2's but kinda overkill here).  Nodes with less then 16gb of ram wont
> last long so should really start around there.
>
> Chris
>
> On Sep 9, 2014, at 11:02 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote:
>
> > Hi ,
> >    Where can I find the document with best practices about sizing for
> cassandra deployment?
> >    We have 1000 writes / reads per second. record size 1k.
> >
> > Questions:
> >    1) how many machines do we need?
> >    2) how many ram ,disc size / type?
> >    3) What should be network?
> >
> > I understand that hardware is very depends on data distribution and
> access pattern and other criteria, but I still want to believe that there
> is a best practice :-)
> >
> > Thanks
> > Oleg.
>
>

Re: hardware sizing for cassandra

Reply via email to