From my experience, that's not entirely true. For large nodes, the bottleneck is usually the JVM garbage collector. The the GC pauses can easily get out of control on very large heaps, and long STW pauses may also result in nodes flip up and down from other nodes' perspective, which often renders the entire cluster unstable.

Using RF=1 is also strongly discouraged, even with reliable and durable storage. By going with RF=1, you don't only lose the data replication, but also the high-availability. If any node becomes unavailable in the cluster, it will render the entire token range(s) owned by that node inaccessible, causing (some or all) CQL queries to fail. This means many routine maintenance tasks, such as upgrading and restarting nodes, are going to introduce downtime for the cluster. To ensure strong consistency and HA, RF=3 is recommended.


On 17/08/2023 20:40, daemeon reiydelle wrote:
A lot of (actually all) seem to be based on local nodes with 1gb networks of spinning rust. Much of what is mentioned below is TOTALLY wrong for cloud. So clarify whether you are "real world" or rusty slow data center world (definitely not modern DC either).

E.g. should not handle more than 2tb of ACTIVE disk, and that was for spinning rust with maybe 1gb networks. 10tb of modern high speed SSD is more typical with 10 or 40gb networks. If data is persisted to cloud storage, replication should be 1, vm's fail over to new hardware. Obviously if your storage is ephemeral, you have a different discussion. More of a monologue with an idiot in Finance, but ....
/./
/Arthur C. Clarke famously said that "technology sufficiently advanced is indistinguishable from magic." Magic is coming, and it's coming for all of us..../
/
/
*Daemeon Reiydelle*
*email: daeme...@gmail.com*
*LI: https://www.linkedin.com/in/daemeonreiydelle/*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*


On Thu, Aug 17, 2023 at 6:13 AM Bowen Song via user <user@cassandra.apache.org> wrote:

    Just pointing out the obvious, for 1PB of data on nodes with 2TB disk
    each, you will need far more than 500 nodes.

    1, it is unwise to run Cassandra with replication factor 1. It
    usually
    makes sense to use RF=3, so 1PB data will cost 3PB of storage space,
    minimal of 1500 such nodes.

    2, depending on the compaction strategy you use and the write access
    pattern, there's a disk space amplification to consider. For example,
    with STCS, the disk usage can be many times of the actual live
    data size.

    3, you will need some extra free disk space as temporary space for
    running compactions.

    4, the data is rarely going to be perfectly evenly distributed
    among all
    nodes, and you need to take that into consideration and size the
    nodes
    based on the node with the most data.

    5, enough of bad news, here's a good one. Compression will save
    you (a
    lot) of disk space!

    With all the above considered, you probably will end up with a lot
    more
    than the 500 nodes you initially thought. Your choice of compaction
    strategy and compression ratio can dramatically affect this
    calculation.


    On 16/08/2023 16:33, Joe Obernberger wrote:
    > General question on how to configure Cassandra.  Say I have
    1PByte of
    > data to store.  The general rule of thumb is that each node (or at
    > least instance of Cassandra) shouldn't handle more than 2TBytes of
    > disk.  That means 500 instances of Cassandra.
    >
    > Assuming you have very fast persistent storage (such as a NetApp,
    > PorterWorx etc.), would using Kubernetes or some orchestration
    layer
    > to handle those nodes be a viable approach?  Perhaps the worker
    nodes
    > would have enough RAM to run 4 instances (pods) of Cassandra, you
    > would need 125 servers.
    > Another approach is to build your servers with 5 (or more) SSD
    devices
    > - one for OS, four for each instance of Cassandra running on that
    > server.  Then build some scripts/ansible/puppet that would manage
    > Cassandra start/stops, and other maintenance items.
    >
    > Where I think this runs into problems is with repairs, or
    > sstablescrubs that can take days to run on a single instance. 
    How is
    > that handled 'in the real world'?  With seed nodes, how many
    would you
    > have in such a configuration?
    > Thanks for any thoughts!
    >
    > -Joe
    >
    >

Reply via email to