Re: Re: Re: how to configure the Token Allocation Algorithm

2019-05-05 Thread Anthony Grasso
Hi Jean,

Good question. I think that sentence is slightly confusing and here is why:

If the cluster has tokens are already evenly distributed and there is no
plans to expand the cluster, then applying the allocate_tokens_for_keyspace
setting has no real practical value.

If the cluster has tokens that are unevenly distributed and there are plans
to expand the cluster, then it may be worth using the
allocate_tokens_for_keyspace setting when adding a new node to the cluster.

Looking back on that sentence, I think it should probably read:

*"However, therein lies the problem, for existing clusters using this
> setting is easy, as a keyspace already exists"*


If you think that wording gives better clarification, I'll go back and
update the post when I have time. Let me know what you think.

Regards,
Anthony

On Mon, 29 Apr 2019 at 18:45, Jean Carlo  wrote:

> Hello Anthony,
>
> Effectively I did not start the seed of every rack firsts. Thank you for
> the post. I believe this is something important to have as official
> documentation in cassandra.apache.org. This issues as many others are not
> documented properly.
>
> Of course I find the blog of last pickle very useful in this matters, but
> having a properly documentation of how to start a fresh new cluster
> cassandra is basic.
>
> I have one question about your post, when you mention
> "*However, therein lies the problem, for existing clusters updating this
> setting is easy, as a keyspace already exists*"
> What is the interest to use allocate_tokens_for_keyspace in a cluster
> with data if there tokens are already distributed? in the worst case
> scenario, the cluster is already unbalanced
>
>
> Cheers
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso 
> wrote:
>
>> Hi Jean,
>>
>> It sounds like there are no nodes in one of the racks for the eu-west-3
>> datacenter. What does the output of nodetool status look like currently?
>>
>> Note, you will need to start a node in each rack before creating the
>> keyspace. I wrote a blog post with the procedure to set up a new cluster
>> using the predictive token allocation algorithm:
>> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>> Regards,
>> Anthony
>>
>> On Fri, 26 Apr 2019 at 19:53, Jean Carlo 
>> wrote:
>>
>>> Creating a fresh new cluster in aws using this procedure, I got this
>>> problem once I am bootstrapping the second rack of the cluster of 6
>>> machines with 3 racks and a keyspace of rf 3
>>>
>>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>>> 7660890915606146375, -5329427405842523680]
>>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>>> replication factor 3.
>>>
>>> Someone got this problem ?
>>>
>>> I am not quite sure why I have this, since my cluster has 3 racks.
>>>
>>> Cluster Information:
>>> Name: test
>>> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>>> DynamicEndPointSnitch: enabled
>>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>> Schema versions:
>>> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>>
>>>
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>>
>>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
>>> wrote:
>>>
 Hi folks,

 What about adding new keyspaces in the existing cluster, test_2 with
 the same RF.

 It will use the same logic as the existing kesypace test ? Or I should
 restart nodes and add the new keyspace to the cassandra.yaml ?

 Thanks.

 Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
 écrit :

> Hi,
>
> Managing `initial_token` by yourself will give you more control over
> scale-in and scale-out.
> Let's say you have three node cluster with `num_token: 1`
>
> And your initial range looks like:-
>
> Datacenter: datacenter1
> ==
> AddressRackStatus State   LoadOwns
>Token
>
>3074457345618258602
>
> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>-9223372036854775808
> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>-3074457345618258603
> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>3074457345618258602
>
> Now let's say you want to scale out the cluster to twice the current
> throughput(means you are adding 3 more nodes)
>
> If you are using AWS EBS volumes then you 

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-29 Thread Jean Carlo
Hello Anthony,

Effectively I did not start the seed of every rack firsts. Thank you for
the post. I believe this is something important to have as official
documentation in cassandra.apache.org. This issues as many others are not
documented properly.

Of course I find the blog of last pickle very useful in this matters, but
having a properly documentation of how to start a fresh new cluster
cassandra is basic.

I have one question about your post, when you mention
"*However, therein lies the problem, for existing clusters updating this
setting is easy, as a keyspace already exists*"
What is the interest to use allocate_tokens_for_keyspace in a cluster with
data if there tokens are already distributed? in the worst case scenario,
the cluster is already unbalanced


Cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso 
wrote:

> Hi Jean,
>
> It sounds like there are no nodes in one of the racks for the eu-west-3
> datacenter. What does the output of nodetool status look like currently?
>
> Note, you will need to start a node in each rack before creating the
> keyspace. I wrote a blog post with the procedure to set up a new cluster
> using the predictive token allocation algorithm:
> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> Regards,
> Anthony
>
> On Fri, 26 Apr 2019 at 19:53, Jean Carlo 
> wrote:
>
>> Creating a fresh new cluster in aws using this procedure, I got this
>> problem once I am bootstrapping the second rack of the cluster of 6
>> machines with 3 racks and a keyspace of rf 3
>>
>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>> 7660890915606146375, -5329427405842523680]
>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>> replication factor 3.
>>
>> Someone got this problem ?
>>
>> I am not quite sure why I have this, since my cluster has 3 racks.
>>
>> Cluster Information:
>> Name: test
>> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>> DynamicEndPointSnitch: enabled
>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> Schema versions:
>> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>
>>
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>>
>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
>> wrote:
>>
>>> Hi folks,
>>>
>>> What about adding new keyspaces in the existing cluster, test_2 with
>>> the same RF.
>>>
>>> It will use the same logic as the existing kesypace test ? Or I should
>>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>>
>>> Thanks.
>>>
>>> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
>>> écrit :
>>>
 Hi,

 Managing `initial_token` by yourself will give you more control over
 scale-in and scale-out.
 Let's say you have three node cluster with `num_token: 1`

 And your initial range looks like:-

 Datacenter: datacenter1
 ==
 AddressRackStatus State   LoadOwns
Token

  3074457345618258602

 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
-9223372036854775808
 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
-3074457345618258603
 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
3074457345618258602

 Now let's say you want to scale out the cluster to twice the current
 throughput(means you are adding 3 more nodes)

 If you are using AWS EBS volumes then you can use the same volumes and
 spin three more nodes by selecting midpoints of existing ranges which means
 your new nodes are already having data.
 Once you have mounted volumes on your new nodes:-
 * You need to delete every system table except schema related tables.
 * You need to generate system/local table by yourself which has
 `Bootstrap state` as completed and schema-version same as other existing
 nodes.
 * You need to remove extra data on all the machines using cleanup
 commands

 This is how you can scale out Cassandra cluster in the minutes. In case
 you want to add nodes one by one then you need to write some small tool
 which will always figure out the bigger range in the existing cluster and
 will split it into the half.

 However, I never tested it thoroughly but this should work
 conceptually. So here we are taking advantage of the fact that we have
 volumes(data) for the new node beforehand so we no need to 

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-28 Thread Anthony Grasso
Hi Jean,

It sounds like there are no nodes in one of the racks for the eu-west-3
datacenter. What does the output of nodetool status look like currently?

Note, you will need to start a node in each rack before creating the
keyspace. I wrote a blog post with the procedure to set up a new cluster
using the predictive token allocation algorithm:
http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

Regards,
Anthony

On Fri, 26 Apr 2019 at 19:53, Jean Carlo  wrote:

> Creating a fresh new cluster in aws using this procedure, I got this
> problem once I am bootstrapping the second rack of the cluster of 6
> machines with 3 racks and a keyspace of rf 3
>
> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
> 3265006217757525070, 5054577454645148534, 314677103601736696,
> 7660890915606146375, -5329427405842523680]
> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
> replication factor 3.
>
> Someone got this problem ?
>
> I am not quite sure why I have this, since my cluster has 3 racks.
>
> Cluster Information:
> Name: test
> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
> DynamicEndPointSnitch: enabled
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>
>
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
> wrote:
>
>> Hi folks,
>>
>> What about adding new keyspaces in the existing cluster, test_2 with the
>> same RF.
>>
>> It will use the same logic as the existing kesypace test ? Or I should
>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>
>> Thanks.
>>
>> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
>> écrit :
>>
>>> Hi,
>>>
>>> Managing `initial_token` by yourself will give you more control over
>>> scale-in and scale-out.
>>> Let's say you have three node cluster with `num_token: 1`
>>>
>>> And your initial range looks like:-
>>>
>>> Datacenter: datacenter1
>>> ==
>>> AddressRackStatus State   LoadOwns
>>>  Token
>>>
>>>  3074457345618258602
>>>
>>> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>>>  -9223372036854775808
>>> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>>>  -3074457345618258603
>>> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>>>  3074457345618258602
>>>
>>> Now let's say you want to scale out the cluster to twice the current
>>> throughput(means you are adding 3 more nodes)
>>>
>>> If you are using AWS EBS volumes then you can use the same volumes and
>>> spin three more nodes by selecting midpoints of existing ranges which means
>>> your new nodes are already having data.
>>> Once you have mounted volumes on your new nodes:-
>>> * You need to delete every system table except schema related tables.
>>> * You need to generate system/local table by yourself which has
>>> `Bootstrap state` as completed and schema-version same as other existing
>>> nodes.
>>> * You need to remove extra data on all the machines using cleanup
>>> commands
>>>
>>> This is how you can scale out Cassandra cluster in the minutes. In case
>>> you want to add nodes one by one then you need to write some small tool
>>> which will always figure out the bigger range in the existing cluster and
>>> will split it into the half.
>>>
>>> However, I never tested it thoroughly but this should work conceptually.
>>> So here we are taking advantage of the fact that we have volumes(data) for
>>> the new node beforehand so we no need to bootstrap them.
>>>
>>> Thanks & Regards,
>>> Varun Barala
>>>
>>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <
>>> onmstes...@zoho.com> wrote:
>>>


 Sent using Zoho Mail 


  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
 >* wrote 

 Hello again :),

 I thought a little bit more about this question, and I was actually
 wondering if something like this would work:

 Imagine 3 node cluster, and create them using:
 For the 3 nodes: `num_token: 4`
 Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
 4611686018427387901`
 Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
 1537228672809129299, 6148914691236517202`
 Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
 3074457345618258600, 7686143364045646503`

  If you know the initial size of your cluster, you can calculate the
 total number of tokens: number of nodes * vnodes and use the

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-26 Thread Jean Carlo
Creating a fresh new cluster in aws using this procedure, I got this
problem once I am bootstrapping the second rack of the cluster of 6
machines with 3 racks and a keyspace of rf 3

WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
3265006217757525070, 5054577454645148534, 314677103601736696,
7660890915606146375, -5329427405842523680]
ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Token allocation
failed: the number of racks 2 in datacenter eu-west-3 is lower than its
replication factor 3.

Someone got this problem ?

I am not quite sure why I have this, since my cluster has 3 racks.

Cluster Information:
Name: test
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]



Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
wrote:

> Hi folks,
>
> What about adding new keyspaces in the existing cluster, test_2 with the
> same RF.
>
> It will use the same logic as the existing kesypace test ? Or I should
> restart nodes and add the new keyspace to the cassandra.yaml ?
>
> Thanks.
>
> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
> écrit :
>
>> Hi,
>>
>> Managing `initial_token` by yourself will give you more control over
>> scale-in and scale-out.
>> Let's say you have three node cluster with `num_token: 1`
>>
>> And your initial range looks like:-
>>
>> Datacenter: datacenter1
>> ==
>> AddressRackStatus State   LoadOwns
>>  Token
>>
>>3074457345618258602
>> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>>  -9223372036854775808
>> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>>  -3074457345618258603
>> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>>  3074457345618258602
>>
>> Now let's say you want to scale out the cluster to twice the current
>> throughput(means you are adding 3 more nodes)
>>
>> If you are using AWS EBS volumes then you can use the same volumes and
>> spin three more nodes by selecting midpoints of existing ranges which means
>> your new nodes are already having data.
>> Once you have mounted volumes on your new nodes:-
>> * You need to delete every system table except schema related tables.
>> * You need to generate system/local table by yourself which has
>> `Bootstrap state` as completed and schema-version same as other existing
>> nodes.
>> * You need to remove extra data on all the machines using cleanup commands
>>
>> This is how you can scale out Cassandra cluster in the minutes. In case
>> you want to add nodes one by one then you need to write some small tool
>> which will always figure out the bigger range in the existing cluster and
>> will split it into the half.
>>
>> However, I never tested it thoroughly but this should work conceptually.
>> So here we are taking advantage of the fact that we have volumes(data) for
>> the new node beforehand so we no need to bootstrap them.
>>
>> Thanks & Regards,
>> Varun Barala
>>
>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester 
>> wrote:
>>
>>>
>>>
>>> Sent using Zoho Mail 
>>>
>>>
>>>  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>>> >* wrote 
>>>
>>> Hello again :),
>>>
>>> I thought a little bit more about this question, and I was actually
>>> wondering if something like this would work:
>>>
>>> Imagine 3 node cluster, and create them using:
>>> For the 3 nodes: `num_token: 4`
>>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>>> 4611686018427387901`
>>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>>> 1537228672809129299, 6148914691236517202`
>>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>>> 3074457345618258600, 7686143364045646503`
>>>
>>>  If you know the initial size of your cluster, you can calculate the
>>> total number of tokens: number of nodes * vnodes and use the
>>> formula/python code above to get the tokens. Then use the first token for
>>> the first node, move to the second node, use the second token and repeat.
>>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>>> ```
>>> >>> number_of_tokens = 12
>>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>>> range(number_of_tokens)]
>>> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
>>> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
>>> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
>>> '6148914691236517202', '7686143364045646503']
>>> ```
>>>
>>>
>>> Using manual initial_token 

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-01-24 Thread Ahmed Eljami
Hi folks,

What about adding new keyspaces in the existing cluster, test_2 with the
same RF.

It will use the same logic as the existing kesypace test ? Or I should
restart nodes and add the new keyspace to the cassandra.yaml ?

Thanks.

Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
écrit :

> Hi,
>
> Managing `initial_token` by yourself will give you more control over
> scale-in and scale-out.
> Let's say you have three node cluster with `num_token: 1`
>
> And your initial range looks like:-
>
> Datacenter: datacenter1
> ==
> AddressRackStatus State   LoadOwns
>  Token
>
>3074457345618258602
> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>  -9223372036854775808
> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>  -3074457345618258603
> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>  3074457345618258602
>
> Now let's say you want to scale out the cluster to twice the current
> throughput(means you are adding 3 more nodes)
>
> If you are using AWS EBS volumes then you can use the same volumes and
> spin three more nodes by selecting midpoints of existing ranges which means
> your new nodes are already having data.
> Once you have mounted volumes on your new nodes:-
> * You need to delete every system table except schema related tables.
> * You need to generate system/local table by yourself which has `Bootstrap
> state` as completed and schema-version same as other existing nodes.
> * You need to remove extra data on all the machines using cleanup commands
>
> This is how you can scale out Cassandra cluster in the minutes. In case
> you want to add nodes one by one then you need to write some small tool
> which will always figure out the bigger range in the existing cluster and
> will split it into the half.
>
> However, I never tested it thoroughly but this should work conceptually.
> So here we are taking advantage of the fact that we have volumes(data) for
> the new node beforehand so we no need to bootstrap them.
>
> Thanks & Regards,
> Varun Barala
>
> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester 
> wrote:
>
>>
>>
>> Sent using Zoho Mail 
>>
>>
>>  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>> >* wrote 
>>
>> Hello again :),
>>
>> I thought a little bit more about this question, and I was actually
>> wondering if something like this would work:
>>
>> Imagine 3 node cluster, and create them using:
>> For the 3 nodes: `num_token: 4`
>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>> 4611686018427387901`
>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>> 1537228672809129299, 6148914691236517202`
>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>> 3074457345618258600, 7686143364045646503`
>>
>>  If you know the initial size of your cluster, you can calculate the
>> total number of tokens: number of nodes * vnodes and use the
>> formula/python code above to get the tokens. Then use the first token for
>> the first node, move to the second node, use the second token and repeat.
>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>> ```
>> >>> number_of_tokens = 12
>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>> range(number_of_tokens)]
>> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
>> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
>> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
>> '6148914691236517202', '7686143364045646503']
>> ```
>>
>>
>> Using manual initial_token (your idea), how could i add a new node to a
>> long running cluster (the procedure)?
>>
>>

-- 
Cordialement;

Ahmed ELJAMI


Re: Re: Re: how to configure the Token Allocation Algorithm

2018-10-02 Thread Varun Barala
Hi,

Managing `initial_token` by yourself will give you more control over
scale-in and scale-out.
Let's say you have three node cluster with `num_token: 1`

And your initial range looks like:-

Datacenter: datacenter1
==
AddressRackStatus State   LoadOwns
 Token

 3074457345618258602
127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
 -9223372036854775808
127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
 -3074457345618258603
127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
 3074457345618258602

Now let's say you want to scale out the cluster to twice the current
throughput(means you are adding 3 more nodes)

If you are using AWS EBS volumes then you can use the same volumes and spin
three more nodes by selecting midpoints of existing ranges which means your
new nodes are already having data.
Once you have mounted volumes on your new nodes:-
* You need to delete every system table except schema related tables.
* You need to generate system/local table by yourself which has `Bootstrap
state` as completed and schema-version same as other existing nodes.
* You need to remove extra data on all the machines using cleanup commands

This is how you can scale out Cassandra cluster in the minutes. In case you
want to add nodes one by one then you need to write some small tool which
will always figure out the bigger range in the existing cluster and will
split it into the half.

However, I never tested it thoroughly but this should work conceptually. So
here we are taking advantage of the fact that we have volumes(data) for
the new node beforehand so we no need to bootstrap them.

Thanks & Regards,
Varun Barala

On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester 
wrote:

>
>
> Sent using Zoho Mail 
>
>
>  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
> >* wrote 
>
> Hello again :),
>
> I thought a little bit more about this question, and I was actually
> wondering if something like this would work:
>
> Imagine 3 node cluster, and create them using:
> For the 3 nodes: `num_token: 4`
> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
> 4611686018427387901`
> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
> 1537228672809129299, 6148914691236517202`
> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
> 3074457345618258600, 7686143364045646503`
>
>  If you know the initial size of your cluster, you can calculate the total
> number of tokens: number of nodes * vnodes and use the formula/python
> code above to get the tokens. Then use the first token for the first node,
> move to the second node, use the second token and repeat. In my case there
> is a total of 12 tokens (3 nodes, 4 tokens each)
> ```
> >>> number_of_tokens = 12
> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
> range(number_of_tokens)]
> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
> '6148914691236517202', '7686143364045646503']
> ```
>
>
> Using manual initial_token (your idea), how could i add a new node to a
> long running cluster (the procedure)?
>
>


Re: Re: Re: how to configure the Token Allocation Algorithm

2018-10-02 Thread onmstester onmstester
Sent using Zoho Mail  On Mon, 01 Oct 2018 18:36:03 +0330 Alain RODRIGUEZ 
 wrote  Hello again :), I thought a little bit more 
about this question, and I was actually wondering if something like this would 
work: Imagine 3 node cluster, and create them using: For the 3 nodes: 
`num_token: 4` Node 1: `intial_token: -9223372036854775808, 
-4611686018427387905, -2, 4611686018427387901` Node 2: `intial_token: 
-7686143364045646507, -3074457345618258604, 1537228672809129299, 
6148914691236517202` Node 3: `intial_token: -6148914691236517206, 
-1537228672809129303, 3074457345618258600, 7686143364045646503`  If you know 
the initial size of your cluster, you can calculate the total number of tokens: 
number of nodes * vnodes and use the formula/python code above to get the 
tokens. Then use the first token for the first node, move to the second node, 
use the second token and repeat. In my case there is a total of 12 tokens (3 
nodes, 4 tokens each) ``` >>> number_of_tokens = 12 >>> [str(((2**64 / 
number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] 
['-9223372036854775808', '-7686143364045646507', '-6148914691236517206', 
'-4611686018427387905', '-3074457345618258604', '-1537228672809129303', '-2', 
'1537228672809129299', '3074457345618258600', '4611686018427387901', 
'6148914691236517202', '7686143364045646503'] ``` Using manual initial_token 
(your idea), how could i add a new node to a long running cluster (the 
procedure)?

Re: Re: Re: how to configure the Token Allocation Algorithm

2018-10-01 Thread Alain RODRIGUEZ
sed. I would probably add a second
cluster when the first one is too big (hundreds of nodes) or split per
service/workflow for example. In practice, the operational complexity is
reduced by automated operations and/or having a good tooling to operate
efficiently.



Le lun. 1 oct. 2018 à 12:37, onmstester onmstester  a
écrit :

> Thanks Alex,
> You are right, that would be a mistake.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>  Forwarded message 
> From : Oleksandr Shulgin 
> To : "User"
> Date : Mon, 01 Oct 2018 13:53:37 +0330
> Subject : Re: Re: how to configure the Token Allocation Algorithm
>  Forwarded message 
>
> On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester 
> wrote:
>
>
>
> What if instead of running that python and having one node with non-vnode
> config, i remove the first seed node and re-add it after cluster was fully
> up ? so the token ranges of first seed node would also be assigned by
> Allocation Alg
>
>
> I think this is tricky because the random allocation of the very first
> tokens from the first seed affects the choice of tokens made by the
> algorithm on the rest of the nodes: it basically tries to divide the token
> ranges in more or less equal parts.  If your very first 8 tokens resulted
> in really bad balance, you are not going to remove that imbalance by
> removing the node, it would still have the lasting effect on the rest of
> your cluster.
>
> --
> Alex
>
>
>
>