Re: Adding nodes

Jeff Jirsa Tue, 12 Jul 2022 07:04:22 -0700

Cassandra isn't Hadoop. Most of the mistakes you're making is treating a
complex distributed system like a different complex distributed system
without understanding the nuance. Racks vs DCs are because you wouldn't
ever want both copies of data on one rack, in case the top of rack switch
or PDU failed (losing both copies, and subsequently losing quorum). Both
concepts are needed. In public cloud, it's like regions and AZs - you may
want 3 copies in each of 3 regions, but within a region, you dont want all
3 of those copies in the same AZ, or you risk an outage if the AZ fails.


Cassandra consistency levels and replication factors are tightly coupled,
giving the developer/operator the choice of tradeoffs that arise during
failures a la CAP theorem.

Your application will specify a consistency of each query. It's typically
QUORUM (for global consistency) or LOCAL_QUORUM (for reads/writes entirely
within a DC). QUORUM and LOCAL_QUORUM are both ((number of replicas) / 2 +
1) - for RF=3, it's 2 of 3 alive. For RF=2, it's 2 of 2 alive.

You should spend some time reading, or taking a course, before you try to
just rush straight to production. Misunderstanding these concepts is common
- there are complex tradeoffs involved and often no "right" answer, just
different tradeoffs for different companies - but it's much harder to fix
them after you already have data (e.g. going from 1 rack to 2 racks is
almost impossible once you've got more data than can fit on one node, you
have to basically add a new DC), so take the time to learn now, rather than
rushing.






On Tue, Jul 12, 2022 at 5:49 AM Marc Hoppins <marc.hopp...@eset.com> wrote:

> The data guys want 2 copies of data in DC1 and that data to be replicated
> offsite to DC1 for 1 copy (DR purposes)
>
>
>
> If this setup doesn’t achieve this, what does?
>
>
>
> At least HBASE was simple enough in that everything could be configured as
> a giant blob of storage with HDFS taking care of keeping (at least) 1 copy
> out of the local system and 1 copy remotely
>
>
>
> *From:* Bowen Song via user <user@cassandra.apache.org>
> *Sent:* Tuesday, July 12, 2022 12:29 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> For RF=2 in your DC1, you will not be able to achieve both strong
> consistency and single point of failure tolerance within that DC. You may
> want to think twice before proceeding with that.
>
> For RF=1 in your DC2, you will not be able to run DC-local repairs within
> in that DC. You may also want to think twice about it.
>
> The rack settings specifies the logical rack of a node, and it affects how
> replica is stored within in the DC, but not how many replicas in that DC.
> The RF affects how many copies of data in the DC, but not how they are
> stored. The rack and RF together work out how many copies and where to
> store them within a DC.
>
> In practice, RF should ideally be whole multiples of the number of racks
> within in the DC to ensure even distribution of the replicas among the
> nodes within the DC. That's why rack=1 will always work, and 1*rack=RF,
> 2*rack=RF and 3*rack=RF, etc. will also work.
>
>
>
> On 12/07/2022 10:34, Marc Hoppins wrote:
>
> The data guys’ plan is for table/keyspace NTS DC1= 2 and DC2= 1 across the
> board. Which leads me to…what is the point of having
> Cassandra-rackdc.properties RACK settings anyway?  If you can specify the
> local replication with the DC, having RACK specified elsewhere (whether it
> is a logical or physical rack) seems to be adding confusion to the pot.
>
>
>
> *From:* Bowen Song via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Sent:* Tuesday, July 12, 2022 11:23 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> I think you are misinterpreting many concepts here. For a starter, a
> physical rack in a physical DC is not (does not have to be) a logical rack
> in a logical DC in Cassandra; and the
> allocate_tokens_for_local_replication_factor has nothing to do with
> replication factor (other than using it as an input), but has everything to
> do with token allocation.
>
> You need to plan for number of logical (not physical) racks per DC, either
> number of rack = 1, and RF = any, or number of rack = RF within that DC.
> It's not impossible to add (or remove) a rack from an existing DC, but it's
> much better to plan ahead.
>
>
>
> On 12/07/2022 07:33, Marc Hoppins wrote:
>
> There is likely going to be 2 racks in each DC.
>
>
>
> Adding the new node decided to quit after 12 hours.  Node was overloaded
> and GC pauses caused the bootstrap to fail.  I begin to see the pattern
> here.  If replication is only within the same datacentre, and one starts
> off with only one rack then all data is within that rack, adding a new
> rack…but can only add one node at a time…will cause a surge of replication
> onto the one new node as this is now a failover point.  I noticed when
> checking netstats on the joining node that it was getting data from 12
> sources. This lead me to the conclusion that ALL the streaming data was
> coming from every node in the same datacentre. I checked this by running
> netstats on other nodes in the second datacentre and they were all
> quiescent.  So, unlike HBASE/HDFS where we can spread the replication
> across sites, it seems that it is not a thing for this software.  Or do I
> have that wrong?
>
>
>
> Now, obviously, this is the second successive failure with adding a new
> node. ALL of the new nodes I need to add are in a new rack.
>
>
>
> # Replica factor is explicitly set, regardless of keyspace or datacenter.
>
> # This is the replica factor within the datacenter, like NTS.
>
> allocate_tokens_for_local_replication_factor: 3
>
>
>
> If this is going to happen every time I try to add a new node this is
> going to be an untenable situation.  Now, I am informed that the data in
> the cluster is not yet production, so it may be possible to wipe everything
> and start again, adding the new rack of nodes at create time. HOWEVER, this
> is then going to resurface when the next rack of nodes is added.  If the
> recommendation is to only add one node at a time to prevent problems with
> token ranges, data  or whatever, it is a serious limitation as not every
> business/organisation is going to have multiple racks available.
>
>
>
> *From:* Bowen Song via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Sent:* Monday, July 11, 2022 8:57 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> I've noticed the joining node has a different rack than the rest of the
> nodes, is this intended? Will you add all new nodes to this rack and have
> RF=2 in that DC?
>
> In principal, you should have equal number of servers (vnodes) in each
> rack, and have the rack number = RF or 1.
>
>
>
> On 11/07/2022 13:15, Marc Hoppins wrote:
>
> All clocks are fine.
>
>
>
> Why would time synch would affect whether or not a node appears in the
> nodetool status when running the command on a different node?  Either the
> node is up and visible or not.
>
>
>
> From 24 other nodes (including ba-freddy14 itself), it shows in the status.
>
>
>
> For those other 23 nodes AND from the joining node, the one node which
> does not show the joining node (ba-freddy03) , is also visible to all other
> nodes when running nodetool.
>
>
>
> A sample set of nodetool output follows. If you look at the last status
> for freddy03 you will see that the joining node (ba-freddy14) does not
> appear, but when I started the join, and for the following 20-25 minutes,
> it DID appear in the status.  So I was just asking if anyone else had
> experienced this behaviour.
>
>
>
> (JOINING NODE) ba-freddy14:nodetool status -r
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  ba-freddy09   591.78 GiB  16      ?
> 9f7cdc62-2d5c-4d6e-be99-86c577131be5  SSW09
>
> UJ  ba-freddy14   117.37 GiB  16      ?
> bf85305e-256f-4eb9-9f15-5462f3b369b9  SSW05
>
> UN  ba-freddy06   614.26 GiB  16      ?
> 30d85b23-c66c-4781-86e9-960375caf476  SSW09
>
> UN  ba-freddy02   329.26 GiB  16      ?
> 3388ca94-5db5-4ef6-b7ab-e6fd0485ba49  SSW09
>
> UN  ba-freddy12   584.57 GiB  16      ?
> 80239a34-89cb-459b-a30f-4253bc16ed99  SSW09
>
> UN  ba-freddy07   563.51 GiB  16      ?
> 4de96ef6-bd48-4b16-bee1-05a0a6c9ac72  SSW09
>
> UN  ba-freddy01   578.5 GiB   16      ?
> 86a84980-2f8f-4d23-9099-d4b48ad9d04c  SSW09
>
> UN  ba-freddy05   575.33 GiB  16      ?
> 26c03d1b-9022-4e1c-bab4-d0d71bddf645  SSW09
>
> UN  ba-freddy10   581.16 GiB  16      ?
> 7c4051a5-1c77-4713-aa43-561063cedb3a  SSW09
>
> UN  ba-freddy08   605.92 GiB  16      ?
> 63fe46d1-c521-4df8-b1bb-ba0136168561  SSW09
>
> UN  ba-freddy04   585.65 GiB  16      ?
> 4503f80a-2890-4a3f-b0cb-d3cedc2b51d2  SSW09
>
> UN  ba-freddy11   576.46 GiB  16      ?
> b5b368fb-ebe3-4eed-a2a1-404b07ae2b6c  SSW09
>
> UN  ba-freddy03   568.95 GiB  16      ?
> 955f21a8-9bc8-4cef-b875-aa4cf7d3294c  SSW09
>
>
>
> Datacenter: DR1
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  dr1-freddy12  453.3 GiB   16      ?
> 533bb049-c8c9-41d9-8da6-64bdeeb6945d  SSW02
>
> UN  dr1-freddy08  448.99 GiB  16      ?
> 6e8c42d2-0f6d-4203-9bf7-5c5fe5e17093  SSW02
>
> UN  dr1-freddy07  450.07 GiB  16      ?
> 4c14b75a-74e8-4518-9c22-053b3a1ad991  SSW02
>
> UN  dr1-freddy02  453.69 GiB  16      ?
> e68298d7-e5eb-421f-a586-d5ee3c026627  SSW02
>
> UN  dr1-freddy10  453.17 GiB  16      ?
> 998bc6cb-7412-411a-89a6-ef5689d61a4a  SSW02
>
> UN  dr1-freddy05  463.07 GiB  16      ?
> 07876bd9-5374-4df8-a480-168b4c06f9f1  SSW02
>
> UN  dr1-freddy11  452.7 GiB   16      ?
> 38fca1c2-59da-4181-93a6-979b937b3fd9  SSW02
>
> UN  dr1-freddy03  460.23 GiB  16      ?
> a1ab1b4b-ccdc-4cb2-ad59-e9e67f0ddfbb  SSW02
>
> UN  dr1-freddy04  462.87 GiB  16      ?
> 29ee0eff-010d-4fbb-b204-095de2225031  SSW02
>
> UN  dr1-freddy06  454.26 GiB  16      ?
> 51467fd3-b795-4ba1-8eec-58b1030cb9c5  SSW02
>
> UN  dr1-freddy09  446.01 GiB  16      ?
> b071e232-b275-4ce7-809c-7c8fe546fbb4  SSW02
>
> UN  dr1-freddy01  450.6 GiB   16      ?
> c2340595-c3ec-440c-b978-62f62fd98a9a  SSW02
>
>
>
> ba-freddy06:nodetool status -r
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  ba-freddy09   591.59 GiB  16      ?
> 9f7cdc62-2d5c-4d6e-be99-86c577131be5  SSW09
>
> UJ  ba-freddy14   117.37 GiB  16      ?
> bf85305e-256f-4eb9-9f15-5462f3b369b9  SSW05
>
> UN  ba-freddy06   614.12 GiB  16      ?
>     30d85b23-c66c-4781-86e9-960375caf476  SSW09
>
> UN  ba-freddy02   329.03 GiB  16      ?
> 3388ca94-5db5-4ef6-b7ab-e6fd0485ba49  SSW09
>
> UN  ba-freddy12   584.4 GiB   16      ?
> 80239a34-89cb-459b-a30f-4253bc16ed99  SSW09
>
> UN  ba-freddy07   563.36 GiB  16      ?
> 4de96ef6-bd48-4b16-bee1-05a0a6c9ac72  SSW09
>
> UN  ba-freddy01   578.36 GiB  16      ?
> 86a84980-2f8f-4d23-9099-d4b48ad9d04c  SSW09
>
> UN  ba-freddy05   575.19 GiB  16      ?
> 26c03d1b-9022-4e1c-bab4-d0d71bddf645  SSW09
>
> UN  ba-freddy10   580.93 GiB  16      ?
> 7c4051a5-1c77-4713-aa43-561063cedb3a  SSW09
>
> UN  ba-freddy08   605.79 GiB  16      ?
> 63fe46d1-c521-4df8-b1bb-ba0136168561  SSW09
>
> UN  ba-freddy04   585.5 GiB   16      ?
> 4503f80a-2890-4a3f-b0cb-d3cedc2b51d2  SSW09
>
> UN  ba-freddy11   576.31 GiB  16      ?
> b5b368fb-ebe3-4eed-a2a1-404b07ae2b6c  SSW09
>
> UN  ba-freddy03   568.81 GiB  16      ?
> 955f21a8-9bc8-4cef-b875-aa4cf7d3294c  SSW09
>
>
>
> Datacenter: DR1
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  dr1-freddy12  453.15 GiB  16      ?
> 533bb049-c8c9-41d9-8da6-64bdeeb6945d  SSW02
>
> UN  dr1-freddy08  448.82 GiB  16      ?
> 6e8c42d2-0f6d-4203-9bf7-5c5fe5e17093  SSW02
>
> UN  dr1-freddy07  449.9 GiB   16      ?
> 4c14b75a-74e8-4518-9c22-053b3a1ad991  SSW02
>
> UN  dr1-freddy02  453.45 GiB  16      ?
> e68298d7-e5eb-421f-a586-d5ee3c026627  SSW02
>
> UN  dr1-freddy10  453.02 GiB  16      ?
>    998bc6cb-7412-411a-89a6-ef5689d61a4a  SSW02
>
> UN  dr1-freddy05  462.92 GiB  16      ?
> 07876bd9-5374-4df8-a480-168b4c06f9f1  SSW02
>
> UN  dr1-freddy11  452.55 GiB  16      ?
> 38fca1c2-59da-4181-93a6-979b937b3fd9  SSW02
>
> UN  dr1-freddy03  460.08 GiB  16      ?
> a1ab1b4b-ccdc-4cb2-ad59-e9e67f0ddfbb  SSW02
>
> UN  dr1-freddy04  462.72 GiB  16      ?
> 29ee0eff-010d-4fbb-b204-095de2225031  SSW02
>
> UN  dr1-freddy06  454.11 GiB  16      ?
> 51467fd3-b795-4ba1-8eec-58b1030cb9c5  SSW02
>
> UN  dr1-freddy09  445.78 GiB  16      ?
> b071e232-b275-4ce7-809c-7c8fe546fbb4  SSW02
>
> UN  dr1-freddy01  450.46 GiB  16      ?
> c2340595-c3ec-440c-b978-62f62fd98a9a  SSW02
>
>
>
> dr1-freddy04: nodetool status -r
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  ba-freddy09   592.05 GiB  16      ?
> 9f7cdc62-2d5c-4d6e-be99-86c577131be5  SSW09
>
> UJ  ba-freddy14   117.37 GiB  16      ?
> bf85305e-256f-4eb9-9f15-5462f3b369b9  SSW05
>
> UN  ba-freddy06   614.56 GiB  16      ?
> 30d85b23-c66c-4781-86e9-960375caf476  SSW09
>
> UN  ba-freddy02   329.57 GiB  16      ?
> 3388ca94-5db5-4ef6-b7ab-e6fd0485ba49  SSW09
>
> UN  ba-freddy12   584.88 GiB  16      ?
> 80239a34-89cb-459b-a30f-4253bc16ed99  SSW09
>
> UN  ba-freddy07   563.84 GiB  16      ?
> 4de96ef6-bd48-4b16-bee1-05a0a6c9ac72  SSW09
>
> UN  ba-freddy01   578.75 GiB  16      ?
> 86a84980-2f8f-4d23-9099-d4b48ad9d04c  SSW09
>
> UN  ba-freddy05   575.54 GiB  16      ?
> 26c03d1b-9022-4e1c-bab4-d0d71bddf645  SSW09
>
> UN  ba-freddy10   581.48 GiB  16      ?
> 7c4051a5-1c77-4713-aa43-561063cedb3a  SSW09
>
> UN  ba-freddy08   606.12 GiB  16      ?
> 63fe46d1-c521-4df8-b1bb-ba0136168561  SSW09
>
> UN  ba-freddy04   585.89 GiB  16      ?
> 4503f80a-2890-4a3f-b0cb-d3cedc2b51d2  SSW09
>
> UN  ba-freddy11   576.71 GiB  16      ?
> b5b368fb-ebe3-4eed-a2a1-404b07ae2b6c  SSW09
>
> UN  ba-freddy03   569.22 GiB  16      ?
> 955f21a8-9bc8-4cef-b875-aa4cf7d3294c  SSW09
>
>
>
> Datacenter: DR1
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  dr1-freddy12  453.6 GiB   16      ?
> 533bb049-c8c9-41d9-8da6-64bdeeb6945d  SSW02
>
> UN  dr1-freddy08  449.3 GiB   16      ?
> 6e8c42d2-0f6d-4203-9bf7-5c5fe5e17093  SSW02
>
> UN  dr1-freddy07  450.42 GiB  16      ?
> 4c14b75a-74e8-4518-9c22-053b3a1ad991  SSW02
>
> UN  dr1-freddy02  454.02 GiB  16      ?
> e68298d7-e5eb-421f-a586-d5ee3c026627  SSW02
>
> UN  dr1-freddy10  453.45 GiB  16      ?
> 998bc6cb-7412-411a-89a6-ef5689d61a4a  SSW02
>
> UN  dr1-freddy05  463.36 GiB  16      ?
> 07876bd9-5374-4df8-a480-168b4c06f9f1  SSW02
>
> UN  dr1-freddy11  453.01 GiB  16      ?
> 38fca1c2-59da-4181-93a6-979b937b3fd9  SSW02
>
> UN  dr1-freddy03  460.55 GiB  16      ?
> a1ab1b4b-ccdc-4cb2-ad59-e9e67f0ddfbb  SSW02
>
> UN  dr1-freddy04  463.19 GiB  16      ?
> 29ee0eff-010d-4fbb-b204-095de2225031  SSW02
>
> UN  dr1-freddy06  454.5 GiB   16      ?
> 51467fd3-b795-4ba1-8eec-58b1030cb9c5  SSW02
>
> UN  dr1-freddy09  446.3 GiB   16      ?
> b071e232-b275-4ce7-809c-7c8fe546fbb4  SSW02
>
> UN  dr1-freddy01  450.79 GiB  16      ?
> c2340595-c3ec-440c-b978-62f62fd98a9a  SSW02
>
>
>
> dr1-freddy11: nodetool status -r
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  ba-freddy09   592.14 GiB  16      ?
> 9f7cdc62-2d5c-4d6e-be99-86c577131be5  SSW09
>
> UJ  ba-freddy14   117.37 GiB  16      ?
> bf85305e-256f-4eb9-9f15-5462f3b369b9  SSW05
>
> UN  ba-freddy06   614.56 GiB  16      ?
> 30d85b23-c66c-4781-86e9-960375caf476  SSW09
>
> UN  ba-freddy02   329.57 GiB  16      ?
> 3388ca94-5db5-4ef6-b7ab-e6fd0485ba49  SSW09
>
> UN  ba-freddy12   584.88 GiB  16      ?
> 80239a34-89cb-459b-a30f-4253bc16ed99  SSW09
>
> UN  ba-freddy07   563.84 GiB  16      ?
> 4de96ef6-bd48-4b16-bee1-05a0a6c9ac72  SSW09
>
> UN  ba-freddy01   578.75 GiB  16      ?
> 86a84980-2f8f-4d23-9099-d4b48ad9d04c  SSW09
>
> UN  ba-freddy05   575.61 GiB  16      ?
> 26c03d1b-9022-4e1c-bab4-d0d71bddf645  SSW09
>
> UN  ba-freddy10   581.48 GiB  16      ?
> 7c4051a5-1c77-4713-aa43-561063cedb3a  SSW09
>
> UN  ba-freddy08   606.19 GiB  16      ?
> 63fe46d1-c521-4df8-b1bb-ba0136168561  SSW09
>
> UN  ba-freddy04   585.98 GiB  16      ?
> 4503f80a-2890-4a3f-b0cb-d3cedc2b51d2  SSW09
>
> UN  ba-freddy11   576.77 GiB  16      ?
> b5b368fb-ebe3-4eed-a2a1-404b07ae2b6c  SSW09
>
> UN  ba-freddy03   569.22 GiB  16      ?
> 955f21a8-9bc8-4cef-b875-aa4cf7d3294c  SSW09
>
>
>
> Datacenter: DR1
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  dr1-freddy12  453.6 GiB   16      ?
> 533bb049-c8c9-41d9-8da6-64bdeeb6945d  SSW02
>
> UN  dr1-freddy08  449.3 GiB   16      ?
> 6e8c42d2-0f6d-4203-9bf7-5c5fe5e17093  SSW02
>
> UN  dr1-freddy07  450.42 GiB  16      ?
> 4c14b75a-74e8-4518-9c22-053b3a1ad991  SSW02
>
> UN  dr1-freddy02  454.02 GiB  16      ?
> e68298d7-e5eb-421f-a586-d5ee3c026627  SSW02
>
> UN  dr1-freddy10  453.45 GiB  16      ?
> 998bc6cb-7412-411a-89a6-ef5689d61a4a  SSW02
>
> UN  dr1-freddy05  463.36 GiB  16      ?
> 07876bd9-5374-4df8-a480-168b4c06f9f1  SSW02
>
> UN  dr1-freddy11  453.01 GiB  16      ?
> 38fca1c2-59da-4181-93a6-979b937b3fd9  SSW02
>
> UN  dr1-freddy03  460.55 GiB  16      ?
> a1ab1b4b-ccdc-4cb2-ad59-e9e67f0ddfbb  SSW02
>
> UN  dr1-freddy04  463.19 GiB  16      ?
> 29ee0eff-010d-4fbb-b204-095de2225031  SSW02
>
> UN  dr1-freddy06  454.5 GiB   16      ?
> 51467fd3-b795-4ba1-8eec-58b1030cb9c5  SSW02
>
> UN  dr1-freddy09  446.3 GiB   16      ?
> b071e232-b275-4ce7-809c-7c8fe546fbb4  SSW02
>
> UN  dr1-freddy01  450.86 GiB  16      ?
> c2340595-c3ec-440c-b978-62f62fd98a9a  SSW02
>
>
>
> ba-freddy03: nodetool status -r
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  ba-freddy09   592.23 GiB  16      ?
> 9f7cdc62-2d5c-4d6e-be99-86c577131be5  SSW09
>
> UN  ba-freddy06   614.63 GiB  16      ?
> 30d85b23-c66c-4781-86e9-960375caf476  SSW09
>
> UN  ba-freddy02   329.66 GiB  16      ?
> 3388ca94-5db5-4ef6-b7ab-e6fd0485ba49  SSW09
>
> UN  ba-freddy12   584.97 GiB  16      ?
> 80239a34-89cb-459b-a30f-4253bc16ed99  SSW09
>
> UN  ba-freddy07   563.91 GiB  16      ?
> 4de96ef6-bd48-4b16-bee1-05a0a6c9ac72  SSW09
>
> UN  ba-freddy01   578.83 GiB  16      ?
> 86a84980-2f8f-4d23-9099-d4b48ad9d04c  SSW09
>
> UN  ba-freddy05   575.69 GiB  16      ?
> 26c03d1b-9022-4e1c-bab4-d0d71bddf645  SSW09
>
> UN  ba-freddy10   581.56 GiB  16      ?
> 7c4051a5-1c77-4713-aa43-561063cedb3a  SSW09
>
> UN  ba-freddy08   606.27 GiB  16      ?
> 63fe46d1-c521-4df8-b1bb-ba0136168561  SSW09
>
> UN  ba-freddy04   586.05 GiB  16      ?
> 4503f80a-2890-4a3f-b0cb-d3cedc2b51d2  SSW09
>
> UN  ba-freddy11   576.86 GiB  16      ?
> b5b368fb-ebe3-4eed-a2a1-404b07ae2b6c  SSW09
>
> UN  ba-freddy03   569.32 GiB  16      ?
> 955f21a8-9bc8-4cef-b875-aa4cf7d3294c  SSW09
>
>
>
> Datacenter: DR1
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address                            Load        Tokens  Owns  Host
> ID                               Rack
>
> UN  dr1-freddy12  453.68 GiB  16      ?
> 533bb049-c8c9-41d9-8da6-64bdeeb6945d  SSW02
>
> UN  dr1-freddy08  449.39 GiB  16      ?
> 6e8c42d2-0f6d-4203-9bf7-5c5fe5e17093  SSW02
>
> UN  dr1-freddy07  450.51 GiB  16      ?
> 4c14b75a-74e8-4518-9c22-053b3a1ad991  SSW02
>
> UN  dr1-freddy02  454.11 GiB  16      ?
> e68298d7-e5eb-421f-a586-d5ee3c026627  SSW02
>
> UN  dr1-freddy10  453.54 GiB  16      ?
> 998bc6cb-7412-411a-89a6-ef5689d61a4a  SSW02
>
> UN  dr1-freddy05  463.44 GiB  16      ?
> 07876bd9-5374-4df8-a480-168b4c06f9f1  SSW02
>
> UN  dr1-freddy11  453.1 GiB   16      ?
> 38fca1c2-59da-4181-93a6-979b937b3fd9  SSW02
>
> UN  dr1-freddy03  460.62 GiB  16      ?
> a1ab1b4b-ccdc-4cb2-ad59-e9e67f0ddfbb  SSW02
>
> UN  dr1-freddy04  463.27 GiB  16      ?
> 29ee0eff-010d-4fbb-b204-095de2225031  SSW02
>
> UN  dr1-freddy06  454.57 GiB  16      ?
> 51467fd3-b795-4ba1-8eec-58b1030cb9c5  SSW02
>
> UN  dr1-freddy09  446.39 GiB  16      ?
> b071e232-b275-4ce7-809c-7c8fe546fbb4  SSW02
>
> UN  dr1-freddy01  450.94 GiB  16      ?
> c2340595-c3ec-440c-b978-62f62fd98a9a  SSW02
>
>
>
> *From:* Joe Obernberger <joseph.obernber...@gmail.com>
> <joseph.obernber...@gmail.com>
> *Sent:* Monday, July 11, 2022 1:29 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> I too came from HBase and discovered adding several nodes at a time
> doesn't work.  Are you absolutely sure that the clocks are in sync across
> the nodes?  This has bitten me several times.
>
> -Joe
>
> On 7/11/2022 6:23 AM, Bowen Song via user wrote:
>
> You should look for warning and error level logs in the system.log, not
> the debug.log or gc.log, and certainly not only the latest lines.
>
> BTW, you may want to spend some time investigating potential GC issues
> based on the GC logs you provided. I can see 1 full GC in the 3 hours since
> the node started. It's not necessarily a problem (if it only occasionally
> happens during the initial bootstraping process), but it should justify an
> investigation if this is the first time you've seen it.
>
> On 11/07/2022 11:09, Marc Hoppins wrote:
>
> Service still running. No errors showing.
>
>
>
> The latest info is in debug.log
>
>
>
> DEBUG [Streaming-EventLoop-4-3] 2022-07-11 12:00:38,902
> NettyStreamingMessageSender.java:258 - [Stream
> #befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Sending keep-alive
>
> DEBUG [Stream-Deserializer-/10.1.146.174:7000-053f2911] 2022-07-11
> 12:00:39,790 StreamingInboundHandler.java:179 - [Stream
> #befbc5d0-00e7-11ed-860a-a139feb6a78a channel: 053f2911] Received keep-alive
>
> DEBUG [ScheduledTasks:1] 2022-07-11 12:00:44,688 StorageService.java:2398
> - Ignoring application state LOAD from /x.x.x.64:7000 because it is not a
> member in token metadata
>
> DEBUG [ScheduledTasks:1] 2022-07-11 12:01:44,689 StorageService.java:2398
> - Ignoring application state LOAD from /x.x.x.64:7000 because it is not a
> member in token metadata
>
> DEBUG [ScheduledTasks:1] 2022-07-11 12:02:44,690 StorageService.java:2398
> - Ignoring application state LOAD from /x.x.x.64:7000 because it is not a
> member in token metadata
>
>
>
> And
>
>
>
> gc.log.1.current
>
>
>
> 2022-07-11T12:08:40.562+0200: 11122.837: [GC (Allocation Failure)
> 2022-07-11T12:08:40.562+0200: 11122.838: [ParNew
>
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
>
> - age   1:      57264 bytes,      57264 total
>
> : 655440K->74K(737280K), 0.0289143 secs] 2575800K->1920436K(8128512K),
> 0.0291355 secs] [Times: user=0.23 sys=0.00, real=0.03 secs]
>
> Heap after GC invocations=6532 (full 1):
>
> par new generation   total 737280K, used 74K [0x00000005cae00000,
> 0x00000005fce00000, 0x00000005fce00000)
>
>   eden space 655360K,   0% used [0x00000005cae00000, 0x00000005cae00000,
> 0x00000005f2e00000)
>
>   from space 81920K,   0% used [0x00000005f2e00000, 0x00000005f2e12848,
> 0x00000005f7e00000)
>
>   to   space 81920K,   0% used [0x00000005f7e00000, 0x00000005f7e00000,
> 0x00000005fce00000)
>
> concurrent mark-sweep generation total 7391232K, used 1920362K
> [0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)
>
> Metaspace       used 53255K, capacity 56387K, committed 56416K, reserved
> 1097728K
>
>   class space    used 6926K, capacity 7550K, committed 7576K, reserved
> 1048576K
>
> }
>
> 2022-07-11T12:08:40.591+0200: 11122.867: Total time for which application
> threads were stopped: 0.0309913 seconds, Stopping threads took: 0.0012599
> seconds
>
> {Heap before GC invocations=6532 (full 1):
>
> par new generation   total 737280K, used 655434K [0x00000005cae00000,
> 0x00000005fce00000, 0x00000005fce00000)
>
>   eden space 655360K, 100% used [0x00000005cae00000, 0x00000005f2e00000,
> 0x00000005f2e00000)
>
>   from space 81920K,   0% used [0x00000005f2e00000, 0x00000005f2e12848,
> 0x00000005f7e00000)
>
>   to   space 81920K,   0% used [0x00000005f7e00000, 0x00000005f7e00000,
> 0x00000005fce00000)
>
> concurrent mark-sweep generation total 7391232K, used 1920362K
> [0x00000005fce00000, 0x00000007c0000000, 0x00000007c0000000)
>
> Metaspace       used 53255K, capacity 56387K, committed 56416K, reserved
> 1097728K
>
>   class space    used 6926K, capacity 7550K, committed 7576K, reserved
> 1048576K
>
> 2022-07-11T12:08:42.163+0200: 11124.438: [GC (Allocation Failure)
> 2022-07-11T12:08:42.163+0200: 11124.438: [ParNew
>
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
>
> - age   1:      54984 bytes,      54984 total
>
> : 655434K->80K(737280K), 0.0291754 secs] 2575796K->1920445K(8128512K),
> 0.0293884 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
>
> *From:* Bowen Song via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Sent:* Monday, July 11, 2022 11:56 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> Checking on multiple nodes won't help if the joining node suffers from any
> of the issues I described, as it will likely be flipping up and down
> frequently, and the existing nodes in the cluster may never reach an
> agreement before the joining node stays up (or stays down) for a while.
> However, it will be a very strange thing if this is a persistent behaviour.
> If the 'nodetool status' output on each node remained unchanged for hours
> and the outputs aren't the same between nodes, it could be an indicator of
> something else that had gone wrong.
>
> Does the strange behaviour goes away after the joining node completes the
> streaming and fully joins the cluster?
>
> On 11/07/2022 10:46, Marc Hoppins wrote:
>
> I am beginning to wonder…
>
>
>
> If you recall, I stated that I had checked status on a bunch of other
> nodes from both datacentres and the joining node shows up. No errors are
> occurring anywhere; data is streaming; node is joining…but, as I also
> stated, on the initial node which I only used to run the nodetool status,
> the new node is no longer showing up.  Thus the new node has not
> disappeared from the cluster, only from nodetool status on that particular
> node – which is already in the cluster, has been so for several weeks, and
> is also functioning without error.
>
>
>
> *From:* Bowen Song via user <user@cassandra.apache.org>
> <user@cassandra.apache.org>
> *Sent:* Monday, July 11, 2022 11:40 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> A node in joining state can disappearing from the cluster from other
> nodes' perspective if the joining node stops sending/receiving gossip
> messages to other nodes. This can happen when the joining node is severely
> overloaded, has bad network connectivity or stuck in long STW GC pauses.
> Regardless of the reason behind it, the state shown on the joining node
> will remain as joining unless the steaming process has failed.
>
> The node state is propagated between nodes via gossip, and there may be a
> delay before all existing nodes agree on the fact that the joining node is
> no longer in the cluster. Within that delay, different nodes in the cluster
> may show different results in 'nodetool status'.
>
> You should check the logs on the existing nodes and the joining node to
> find out why is it happening, and make appropriate changes if needed.
>
> On 11/07/2022 09:23, Marc Hoppins wrote:
>
> Further oddities…
>
>
>
> I was sitting here watching our new new node being added (nodetool status
> being run from one of the seed nodes) and all was going well.  Then I
> noticed that our new new node was no longer visible.  I checked the service
> on the new new node and it was still running. So I checked status from this
> node and it shows in the status report (still UJ and streaming data), but
> takes a little longer to get the results than it did when it was visible
> from the seed.
>
>
>
> I checked status from a few different nodes in both datacentres (including
> other seeds) and the new new node shows up but from the original seed node,
> it does not appear in the nodetool status. Can anyone shed any light on
> this phenomena?
>
>
>
> *From:* Marc Hoppins <marc.hopp...@eset.com> <marc.hopp...@eset.com>
> *Sent:* Monday, July 11, 2022 10:02 AM
> *To:* user@cassandra.apache.org
> *Cc:* Bowen Song <bo...@bso.ng> <bo...@bso.ng>
> *Subject:* RE: Adding nodes
>
>
>
> Well then…
>
>
>
> I left this on Friday (still running) and came back to it today (Monday)
> to find the service stopped.  So, I blitzed this node from the ring and
> began anew with a different new node.
>
>
>
> I rather suspect the problem was with trying to use Ansible to add these
> initially - despite the fact that I had a serial limit of 1 and a pause of
> 90s for starting the service on each new node (based on the time taken when
> setting up this Cassandra cluster).
>
>
>
> So…moving forward…
>
>
>
> It is recommended to only add one new node at a time from what I read.
> This leads me to:
>
>
>
> Although I see the new node LOAD is progressing far faster than the
> previous failure, it is still going to take several hours to move from UJ
> to UN, which means I’ll be at this all week for the 12 new nodes. If our
> LOAD per node is around 400-600GB, is there any practical method to speed
> up adding multiple new nodes which is unlikely to cause problems?  After
> all, in the modern world of big (how big is big?) data, 600G per node is
> far less than the real BIG big-data.
>
>
>
> Marc
>
>
>
> *From:* Jeff Jirsa <jji...@gmail.com>
> *Sent:* Friday, July 8, 2022 5:46 PM
> *To:* cassandra <user@cassandra.apache.org>
> *Cc:* Bowen Song <bo...@bso.ng>
> *Subject:* Re: Adding nodes
>
>
>
> EXTERNAL
>
> Having a node UJ but not sending/receiving other streams is an invalid
> state (unless 4.0 moved the streaming data out of netstats? I'm not 100%
> sure, but I'm 99% sure it should be there).
>
>
>
> It likely stopped the bootstrap process long ago with an error (which you
> may not have seen), and is running without being in the ring, but also not
> trying to join the ring.
>
>
>
> 145GB vs 1.1T could be bits vs bytes (that's a factor of 8), or it could
> be that you streamed data and compacted it away. Hard to say, but less
> important - the fact that it's UJ but not streaming means there's a
> different problem.
>
>
>
> If it's me, I do this (not guaranteed to work, your mileage may vary, etc):
>
> 1) Look for errors in the logs of ALL hosts. In the joining host, look for
> an exception that stops bootstrap. In the others, look for messages about
> errors streaming, and/or exceptions around file access. In all of those
> hosts, check to see if any of them think they're streaming ( nodetool
> netstats again)
>
> 2) Stop the joining host. It's almost certainly not going to finish now.
> Remove data directories, commitlog directory, saved caches, hints. Wait 2
> minutes. Make sure every other host in the cluster sees it disappear from
> the ring. Then, start it fresh and let it bootstrap again. (you could
> alternatively try the resumable bootstrap option, but I never use it).
>
>
>
>
>
>
>
> On Fri, Jul 8, 2022 at 2:56 AM Marc Hoppins <marc.hopp...@eset.com> wrote:
>
> Ifconfig shows RX of 1.1T. This doesn't seem to fit with the LOAD of
> 145GiB (nodetool status), unless I am reading that wrong...and the fact
> that this node still has a status of UJ.
>
> Netstats on this node shows (other than :
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         0              0         0
> Small messages                  n/a        53      569755545  15740262
> Gossip messages                 n/a         0         288878         2
> None of this addresses the issue of not being able to add more nodes.
>
> -----Original Message-----
> From: Bowen Song via user <user@cassandra.apache.org>
> Sent: Friday, July 8, 2022 11:47 AM
> To: user@cassandra.apache.org
> Subject: Re: Adding nodes
>
> EXTERNAL
>
>
> I would assume that's 85 GB (i.e. gigabytes) then. Which is approximately
> 79 GiB (i.e. gibibytes). This still sounds awfully slow - less than 1MB/s
> over a full day (24 hours).
>
> You said CPU and network aren't the bottleneck. Have you checked the disk
> IO? Also, be mindful with CPU usage. It can still be a bottleneck if one
> thread uses 100% of a CPU core while all other cores are idle.
>
> On 08/07/2022 07:09, Marc Hoppins wrote:
> > Thank you for pointing that out.
> >
> > 85 gigabytes/gibibytes/GIGABYTES/GIBIBYTES/whatever name you care to
> > give it
> >
> > CPU and bandwidth are not the problem.
> >
> > Version 4.0.3 but, as I stated, all nodes use the same version so the
> version is not important either.
> >
> > Existing nodes have 350-400+(choose whatever you want to call a
> > gigabyte)
> >
> > The problem appears to be that adding new nodes is a serial process,
> which is fine when there is no data and each node is added within
> 2minutes.  It is hardly practical in production.
> >
> > -----Original Message-----
> > From: Bowen Song via user <user@cassandra.apache.org>
> > Sent: Thursday, July 7, 2022 8:43 PM
> > To: user@cassandra.apache.org
> > Subject: Re: Adding nodes
> >
> > EXTERNAL
> >
> >
> > 86Gb (that's gigabits, which is 10.75GB, gigabytes) took an entire day
> seems obviously too long. I would check the network bandwidth, disk IO and
> CPU usage and find out what is the bottleneck.
> >
> > On 07/07/2022 15:48, Marc Hoppins wrote:
> >> Hi all,
> >>
> >> Cluster of 2 DC and 24 nodes
> >>
> >> DC1 (RF3) = 12 nodes, 16 tokens each
> >> DC2 (RF3) = 12 nodes, 16 tokens each
> >>
> >> Adding 12 more nodes to DC1: I installed Cassandra (version is the same
> across all nodes) but, after the first node added, I couldn't seem to add
> any further nodes.
> >>
> >> I check nodetool status and the newly added node is UJ. It remains this
> way all day and only 86Gb of data is added to the node over the entire day
> (probably not yet complete).  This seems a little slow and, more than a
> little inconvenient to only be able to add one node at a time - or at least
> one node every 2 minutes.  When the cluster was created, I timed each node
> from service start to status UJ (having a UUID) and it was around 120
> seconds.  Of course there was no data.
> >>
> >> Is it possible I have some setting not correctly tuned?
> >>
> >> Thanks
> >>
> >> Marc
>
>
> ------------------------------
>
> [image: Image removed by sender. AVG logo]
> <https://www.avg.com/internet-security>
>
> This email has been checked for viruses by AVG antivirus software.
> www.avg.com <https://www.avg.com/internet-security>
>
>
>
>
>
>
>

Re: Adding nodes

Reply via email to