Re: Replication to second data center with different number of nodes
Sharing my experience here. 1) Never had any issues with different size DCs. If the hardware is the same, keep the # to 256. 2) In most of the cases I keep the 256 vnodes and no performance problems (when they are triggered, the cause is not the vnodes #) Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>* Tel: 1649 www.pythian.com On Mon, Mar 30, 2015 at 6:31 AM, Anishek Agarwal wrote: > Colin, > > When you said larger number of tokens has Query performance hit, is it > read or write performance. Also if you have any links you could share to > shed some light on this it would be great. > > Thanks > Anishek > > On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark wrote: > >> I typically use a # a lot lower than 256, usually less than 20 for >> num_tokens as a larger number has historically had a dramatic impact on >> query performance. >> — >> Colin Clark >> co...@clark.ws >> +1 612-859-6129 >> skype colin.p.clark >> >> On Mar 28, 2015, at 3:46 PM, Eric Stevens wrote: >> >> If you're curious about how Cassandra knows how to replicate data in the >> remote DC, it's the same as in the local DC, replication is independent in >> each, and you can even set a different replication strategy per keyspace >> per datacenter. Nodes in each DC take up num_tokens positions on a ring, >> each partition key is mapped to a position on that ring, and whomever owns >> that part of the ring is the primary for that data. Then (oversimplified) >> r-1 adjacent nodes become replicas for that same data. >> >> On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles < >> charles.sibb...@bskyb.com> wrote: >> >>> >>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens >>> >>> So go with a default 256, and leave initial token empty: >>> >>> num_tokens: 256 >>> >>> # initial_token: >>> >>> >>> Cassandra will always give each node the same number of tokens, the >>> only time you might want to distribute this is if your instances are of >>> different sizing/capability which is also a bad scenario. >>> >>> From: Björn Hachmann >>> Reply-To: "user@cassandra.apache.org" >>> Date: Friday, 27 March 2015 12:11 >>> To: user >>> Subject: Re: Replication to second data center with different number of >>> nodes >>> >>> >>> 2015-03-27 11:58 GMT+01:00 Sibbald, Charles : >>> >>>> Cassandra’s Vnodes config >>> >>> >>> Thank you. Yes, we are using vnodes! The num_token parameter controls >>> the number of vnodes assigned to a specific node. >>> >>> Might be I am seeing problems where are none. >>> >>> Let me rephrase my question: How does Cassandra know it has to >>> replicate 1/3 of all keys to each single node in the second DC? I can see >>> two ways: >>> 1. It has to be configured explicitly. >>> 2. It is derived from the number of nodes available in the data center >>> at the time `nodetool rebuild` is started. >>> >>> Kind regards >>> Björn >>> Information in this email including any attachments may be >>> privileged, confidential and is intended exclusively for the addressee. The >>> views expressed may not be official policy, but the personal views of the >>> originator. If you have received it in error, please notify the sender by >>> return e-mail and delete it from your system. You should not reproduce, >>> distribute, store, retransmit, use or disclose its contents to anyone. >>> Please note we reserve the right to monitor all e-mail communication >>> through our internal and external networks. SKY and the SKY marks are >>> trademarks of Sky plc and Sky International AG and are used under licence. >>> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited >>> (Registration No. 2067075) and Sky Subscribers Services Limited >>> (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc >>> (Registration No. 2247735). All of the companies mentioned in this >>> paragraph are incorporated in England and Wales and share the same >>> registered office at Grant Way, Isleworth, Middlesex TW7 5QD. >>> >> >> >> > -- --
Re: Replication to second data center with different number of nodes
Colin, When you said larger number of tokens has Query performance hit, is it read or write performance. Also if you have any links you could share to shed some light on this it would be great. Thanks Anishek On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark wrote: > I typically use a # a lot lower than 256, usually less than 20 for > num_tokens as a larger number has historically had a dramatic impact on > query performance. > — > Colin Clark > co...@clark.ws > +1 612-859-6129 > skype colin.p.clark > > On Mar 28, 2015, at 3:46 PM, Eric Stevens wrote: > > If you're curious about how Cassandra knows how to replicate data in the > remote DC, it's the same as in the local DC, replication is independent in > each, and you can even set a different replication strategy per keyspace > per datacenter. Nodes in each DC take up num_tokens positions on a ring, > each partition key is mapped to a position on that ring, and whomever owns > that part of the ring is the primary for that data. Then (oversimplified) > r-1 adjacent nodes become replicas for that same data. > > On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles < > charles.sibb...@bskyb.com> wrote: > >> >> http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens >> >> So go with a default 256, and leave initial token empty: >> >> num_tokens: 256 >> >> # initial_token: >> >> >> Cassandra will always give each node the same number of tokens, the >> only time you might want to distribute this is if your instances are of >> different sizing/capability which is also a bad scenario. >> >> From: Björn Hachmann >> Reply-To: "user@cassandra.apache.org" >> Date: Friday, 27 March 2015 12:11 >> To: user >> Subject: Re: Replication to second data center with different number of >> nodes >> >> >> 2015-03-27 11:58 GMT+01:00 Sibbald, Charles : >> >>> Cassandra’s Vnodes config >> >> >> Thank you. Yes, we are using vnodes! The num_token parameter controls >> the number of vnodes assigned to a specific node. >> >> Might be I am seeing problems where are none. >> >> Let me rephrase my question: How does Cassandra know it has to >> replicate 1/3 of all keys to each single node in the second DC? I can see >> two ways: >> 1. It has to be configured explicitly. >> 2. It is derived from the number of nodes available in the data center >> at the time `nodetool rebuild` is started. >> >> Kind regards >> Björn >> Information in this email including any attachments may be privileged, >> confidential and is intended exclusively for the addressee. The views >> expressed may not be official policy, but the personal views of the >> originator. If you have received it in error, please notify the sender by >> return e-mail and delete it from your system. You should not reproduce, >> distribute, store, retransmit, use or disclose its contents to anyone. >> Please note we reserve the right to monitor all e-mail communication >> through our internal and external networks. SKY and the SKY marks are >> trademarks of Sky plc and Sky International AG and are used under licence. >> Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited >> (Registration No. 2067075) and Sky Subscribers Services Limited >> (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc >> (Registration No. 2247735). All of the companies mentioned in this >> paragraph are incorporated in England and Wales and share the same >> registered office at Grant Way, Isleworth, Middlesex TW7 5QD. >> > > >
Re: Replication to second data center with different number of nodes
I typically use a # a lot lower than 256, usually less than 20 for num_tokens as a larger number has historically had a dramatic impact on query performance. — Colin Clark co...@clark.ws +1 612-859-6129 skype colin.p.clark > On Mar 28, 2015, at 3:46 PM, Eric Stevens wrote: > > If you're curious about how Cassandra knows how to replicate data in the > remote DC, it's the same as in the local DC, replication is independent in > each, and you can even set a different replication strategy per keyspace per > datacenter. Nodes in each DC take up num_tokens positions on a ring, each > partition key is mapped to a position on that ring, and whomever owns that > part of the ring is the primary for that data. Then (oversimplified) r-1 > adjacent nodes become replicas for that same data. > > On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles <mailto:charles.sibb...@bskyb.com>> wrote: > http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens > > <http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens> > > So go with a default 256, and leave initial token empty: > > num_tokens: 256 > # initial_token: > > Cassandra will always give each node the same number of tokens, the only time > you might want to distribute this is if your instances are of different > sizing/capability which is also a bad scenario. > > From: Björn Hachmann <mailto:bjoern.hachm...@metrigo.de>> > Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" > mailto:user@cassandra.apache.org>> > Date: Friday, 27 March 2015 12:11 > To: user mailto:user@cassandra.apache.org>> > Subject: Re: Replication to second data center with different number of nodes > > > 2015-03-27 11:58 GMT+01:00 Sibbald, Charles <mailto:charles.sibb...@bskyb.com>>: > Cassandra’s Vnodes config > > Thank you. Yes, we are using vnodes! The num_token parameter controls the > number of vnodes assigned to a specific node. > > Might be I am seeing problems where are none. > > Let me rephrase my question: How does Cassandra know it has to replicate 1/3 > of all keys to each single node in the second DC? I can see two ways: > 1. It has to be configured explicitly. > 2. It is derived from the number of nodes available in the data center at > the time `nodetool rebuild` is started. > > Kind regards > Björn > Information in this email including any attachments may be privileged, > confidential and is intended exclusively for the addressee. The views > expressed may not be official policy, but the personal views of the > originator. If you have received it in error, please notify the sender by > return e-mail and delete it from your system. You should not reproduce, > distribute, store, retransmit, use or disclose its contents to anyone. Please > note we reserve the right to monitor all e-mail communication through our > internal and external networks. SKY and the SKY marks are trademarks of Sky > plc and Sky International AG and are used under licence. Sky UK Limited > (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. > 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are > direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of > the companies mentioned in this paragraph are incorporated in England and > Wales and share the same registered office at Grant Way, Isleworth, Middlesex > TW7 5QD. > smime.p7s Description: S/MIME cryptographic signature
Re: Replication to second data center with different number of nodes
If you're curious about how Cassandra knows how to replicate data in the remote DC, it's the same as in the local DC, replication is independent in each, and you can even set a different replication strategy per keyspace per datacenter. Nodes in each DC take up num_tokens positions on a ring, each partition key is mapped to a position on that ring, and whomever owns that part of the ring is the primary for that data. Then (oversimplified) r-1 adjacent nodes become replicas for that same data. On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles wrote: > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens > > So go with a default 256, and leave initial token empty: > > num_tokens: 256 > > # initial_token: > > > Cassandra will always give each node the same number of tokens, the only > time you might want to distribute this is if your instances are of > different sizing/capability which is also a bad scenario. > > From: Björn Hachmann > Reply-To: "user@cassandra.apache.org" > Date: Friday, 27 March 2015 12:11 > To: user > Subject: Re: Replication to second data center with different number of > nodes > > > 2015-03-27 11:58 GMT+01:00 Sibbald, Charles : > >> Cassandra’s Vnodes config > > > Thank you. Yes, we are using vnodes! The num_token parameter controls the > number of vnodes assigned to a specific node. > > Might be I am seeing problems where are none. > > Let me rephrase my question: How does Cassandra know it has to replicate > 1/3 of all keys to each single node in the second DC? I can see two ways: > 1. It has to be configured explicitly. > 2. It is derived from the number of nodes available in the data center at > the time `nodetool rebuild` is started. > > Kind regards > Björn > Information in this email including any attachments may be privileged, > confidential and is intended exclusively for the addressee. The views > expressed may not be official policy, but the personal views of the > originator. If you have received it in error, please notify the sender by > return e-mail and delete it from your system. You should not reproduce, > distribute, store, retransmit, use or disclose its contents to anyone. > Please note we reserve the right to monitor all e-mail communication > through our internal and external networks. SKY and the SKY marks are > trademarks of Sky plc and Sky International AG and are used under licence. > Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited > (Registration No. 2067075) and Sky Subscribers Services Limited > (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc > (Registration No. 2247735). All of the companies mentioned in this > paragraph are incorporated in England and Wales and share the same > registered office at Grant Way, Isleworth, Middlesex TW7 5QD. >
Re: Replication to second data center with different number of nodes
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens So go with a default 256, and leave initial token empty: num_tokens: 256 # initial_token: Cassandra will always give each node the same number of tokens, the only time you might want to distribute this is if your instances are of different sizing/capability which is also a bad scenario. From: Björn Hachmann mailto:bjoern.hachm...@metrigo.de>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Friday, 27 March 2015 12:11 To: user mailto:user@cassandra.apache.org>> Subject: Re: Replication to second data center with different number of nodes 2015-03-27 11:58 GMT+01:00 Sibbald, Charles mailto:charles.sibb...@bskyb.com>>: Cassandra’s Vnodes config Thank you. Yes, we are using vnodes! The num_token parameter controls the number of vnodes assigned to a specific node. Might be I am seeing problems where are none. Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of all keys to each single node in the second DC? I can see two ways: 1. It has to be configured explicitly. 2. It is derived from the number of nodes available in the data center at the time `nodetool rebuild` is started. Kind regards Björn Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and Sky International AG and are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Re: Replication to second data center with different number of nodes
2015-03-27 11:58 GMT+01:00 Sibbald, Charles : > Cassandra’s Vnodes config Thank you. Yes, we are using vnodes! The num_token parameter controls the number of vnodes assigned to a specific node. Might be I am seeing problems where are none. Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of all keys to each single node in the second DC? I can see two ways: 1. It has to be configured explicitly. 2. It is derived from the number of nodes available in the data center at the time `nodetool rebuild` is started. Kind regards Björn
Re: Replication to second data center with different number of nodes
I would recommend you utilise Cassandra’s Vnodes config and let it manage this itself. This means it will create these and a mange them all on its own and allows quick and easy scaling and boot strapping. From: Björn Hachmann mailto:bjoern.hachm...@metrigo.de>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Friday, 27 March 2015 10:40 To: user mailto:user@cassandra.apache.org>> Subject: Replication to second data center with different number of nodes Hi, we currently plan to add a second data center to our Cassandra-Cluster. I have read about this procedure in the documentation (eg. https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html), but at least one question remains: Do I have to provide appropriate values for num_tokens dependent on the number of nodes per data center, or is this handled somehow by the NetworkTopologyStrategy? Example: We currently have 12 nodes each covering 256 tokens. Our second datacenter will have three nodes only. Do I have to set num_tokens to 1024 (12*256/3) for the nodes in that DC? Thank you very much for your valuable input! Kind regards Björn Hachmann Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and Sky International AG and are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Replication to second data center with different number of nodes
Hi, we currently plan to add a second data center to our Cassandra-Cluster. I have read about this procedure in the documentation (eg. https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html), but at least one question remains: Do I have to provide appropriate values for num_tokens dependent on the number of nodes per data center, or is this handled somehow by the NetworkTopologyStrategy? Example: We currently have 12 nodes each covering 256 tokens. Our second datacenter will have three nodes only. Do I have to set num_tokens to 1024 (12*256/3) for the nodes in that DC? Thank you very much for your valuable input! Kind regards Björn Hachmann