Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread onmstester onmstester
The problem is that i have defined too many racks in my cluster (because i have 
multiple Cassandra nodes on a single server, so i defined each physical server 
as a separate rack) and because i haven't heard of any rule of "one seed per 
rack" before the tlp article, (actually the only rule about seed node i had in 
my mind was: "3-4 seed nodes in the cluster is enough, more is unnecessary and 
nonperformant"), i set up my clusters with 3-4 seed nodes always.



I already have a cluster set-up with the wrong mechanism (just one seed node 
with initial_token and then just bootsrtapped other nodes one after another), 
and it seems to be working, it's almost balanced and when i unplug a whole 
rack, writes and reads are still working with no error (using CL=ONE). 

So what would be the problem? Is this catastrophic to not to use manual token 
on every seed node of any rack?

I assume that when i define racks, whatever happens, Cassandra never put two 
copies of my data in a single rack? (Right now, its my main concern, because 
i'm OK with my cluster's balanced load)


Sent using https://www.zoho.com/mail/








 On Mon, 06 May 2019 07:17:14 +0430 Anthony Grasso 
 wrote 



Hi



If you are planning on setting up a new cluster with 
allocate_tokens_for_keyspace, then yes, you will need one seed node per rack. 
As Jon mentioned in a previous email, you must manually specify the token range 
for each seed node. This can be done using the initial_token setting.



The article you are referring to 
(https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html)
 includes python code which calculates the token ranges for each of the seed 
nodes. When calling that python code, you must specify the vnodes - number of 
token per node and the number of racks.



Regards,

Anthony







On Sat, 4 May 2019 at 19:14, onmstester onmstester 
 wrote:







I just read this article by tlp:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

 

Noticed that:

>>We will need to set the tokens for the seed nodes in each rack manually. This 
>>is to prevent each node from randomly calculating its own token ranges



 But until now, i was using this recommendation to setup a new cluster:

>>

You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 4) * 
i) - 2**63) for i in range(4)])'


After you fire up the first seed, create a keyspace using RF=3 (or whatever 
you're planning on using) and set allocate_tokens_for_keyspace to that keyspace 
in your config, and join the rest of the nodes. That gives even
distribution.

I've defined plenty of racks in my cluster (and only 3 seed nodes), should i 
have a seed node per rack and use initial_token for all of the seed nodes or 
just one seed node with inital_token would be ok?

Best Regards

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso
Hi

If you are planning on setting up a new cluster with
allocate_tokens_for_keyspace, then yes, you will need one seed node per
rack. As Jon mentioned in a previous email, you must manually specify the
token range for *each* seed node. This can be done using the initial_token
setting.

The article you are referring to (
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html)
includes python code which calculates the token ranges for each of the seed
nodes. When calling that python code, you must specify the vnodes - number
of token per node and the number of racks.

Regards,
Anthony

On Sat, 4 May 2019 at 19:14, onmstester onmstester
 wrote:

> I just read this article by tlp:
>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> Noticed that:
> >>We will need to set the tokens for the seed nodes in each rack
> manually. This is to prevent each node from randomly calculating its own
> token ranges
>
>  But until now, i was using this recommendation to setup a new cluster:
> >>
>
> You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 4) 
> * i) - 2**63) for i in range(4)])'
>
>
> After you fire up the first seed, create a keyspace using RF=3 (or whatever 
> you're planning on using) and set allocate_tokens_for_keyspace to that 
> keyspace in your config, and join the rest of the nodes. That gives even
> distribution.
>
> I've defined plenty of racks in my cluster (and only 3 seed nodes), should
> i have a seed node per rack and use initial_token for all of the seed nodes
> or just one seed node with inital_token would be ok?
> Best Regards
>
>
>


Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Anthony Grasso
Good idea Jeff. I can add that in if you like? Do we have a ticket for it
or should I just raise one?

On Mon, 6 May 2019 at 03:49, Jeff Jirsa  wrote:

> Picking an ideal allocation for N seed nodes and M vnodes per seed is
> probably something we should add as a little python script or similar in
> /tools/ to make this easier. Then let the auto allocation stuff kick in
> after that.
>
>
> > On May 5, 2019, at 8:23 AM, Jon Haddad  wrote:
> >
> > I mean you'd want to set up the initial tokens for the first 3 nodes
> > of your cluster, which are usually the seed nodes.
> >
> >
> > On Sat, May 4, 2019 at 8:31 PM onmstester onmstester
> >  wrote:
> >>
> >> So do you mean setting tokens for only one node (one of the seed node)
> is fair enough?
> >> I can not see any problem with this mechanism (only one manual token
> assignment at cluster set up), but the article was also trying to set up a
> balanced cluster and the way that it insist on doing manual token
> assignment for multiple seed nodes, confused me.
> >>
> >> Sent using Zoho Mail
> >>
> >>
> >>
> >>  Forwarded message 
> >> From: Jon Haddad 
> >> To: 
> >> Date: Sat, 04 May 2019 22:10:39 +0430
> >> Subject: Re: How to set up a cluster with allocate_tokens_for_keyspace?
> >>  Forwarded message 
> >>
> >> That line is only relevant for when you're starting your cluster and
> >> you need to define your initial tokens in a non-random way. Random
> >> token distribution doesn't work very well when you only use 4 tokens.
> >>
> >> Once you get the cluster set up you don't need to specify tokens
> >> anymore, you can just use allocate_tokens_for_keyspace.
> >>
> >> On Sat, May 4, 2019 at 2:14 AM onmstester onmstester
> >>  wrote:
> >>>
> >>> I just read this article by tlp:
> >>>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >>>
> >>> Noticed that:
> > We will need to set the tokens for the seed nodes in each rack
> manually. This is to prevent each node from randomly calculating its own
> token ranges
> >>>
> >>> But until now, i was using this recommendation to setup a new cluster:
> >
> >>>
> >>> You'll want to set them explicitly using: python -c 'print(
> [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
> >>>
> >>>
> >>> After you fire up the first seed, create a keyspace using RF=3 (or
> whatever you're planning on using) and set allocate_tokens_for_keyspace to
> that keyspace in your config, and join the rest of the nodes. That gives
> even
> >>> distribution.
> >>>
> >>> I've defined plenty of racks in my cluster (and only 3 seed nodes),
> should i have a seed node per rack and use initial_token for all of the
> seed nodes or just one seed node with inital_token would be ok?
> >>>
> >>> Best Regards
> >>>
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
> >>
> >>
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Re: Re: how to configure the Token Allocation Algorithm

2019-05-05 Thread Anthony Grasso
Hi Jean,

Good question. I think that sentence is slightly confusing and here is why:

If the cluster has tokens are already evenly distributed and there is no
plans to expand the cluster, then applying the allocate_tokens_for_keyspace
setting has no real practical value.

If the cluster has tokens that are unevenly distributed and there are plans
to expand the cluster, then it may be worth using the
allocate_tokens_for_keyspace setting when adding a new node to the cluster.

Looking back on that sentence, I think it should probably read:

*"However, therein lies the problem, for existing clusters using this
> setting is easy, as a keyspace already exists"*


If you think that wording gives better clarification, I'll go back and
update the post when I have time. Let me know what you think.

Regards,
Anthony

On Mon, 29 Apr 2019 at 18:45, Jean Carlo  wrote:

> Hello Anthony,
>
> Effectively I did not start the seed of every rack firsts. Thank you for
> the post. I believe this is something important to have as official
> documentation in cassandra.apache.org. This issues as many others are not
> documented properly.
>
> Of course I find the blog of last pickle very useful in this matters, but
> having a properly documentation of how to start a fresh new cluster
> cassandra is basic.
>
> I have one question about your post, when you mention
> "*However, therein lies the problem, for existing clusters updating this
> setting is easy, as a keyspace already exists*"
> What is the interest to use allocate_tokens_for_keyspace in a cluster
> with data if there tokens are already distributed? in the worst case
> scenario, the cluster is already unbalanced
>
>
> Cheers
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso 
> wrote:
>
>> Hi Jean,
>>
>> It sounds like there are no nodes in one of the racks for the eu-west-3
>> datacenter. What does the output of nodetool status look like currently?
>>
>> Note, you will need to start a node in each rack before creating the
>> keyspace. I wrote a blog post with the procedure to set up a new cluster
>> using the predictive token allocation algorithm:
>> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>> Regards,
>> Anthony
>>
>> On Fri, 26 Apr 2019 at 19:53, Jean Carlo 
>> wrote:
>>
>>> Creating a fresh new cluster in aws using this procedure, I got this
>>> problem once I am bootstrapping the second rack of the cluster of 6
>>> machines with 3 racks and a keyspace of rf 3
>>>
>>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>>> 7660890915606146375, -5329427405842523680]
>>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>>> replication factor 3.
>>>
>>> Someone got this problem ?
>>>
>>> I am not quite sure why I have this, since my cluster has 3 racks.
>>>
>>> Cluster Information:
>>> Name: test
>>> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>>> DynamicEndPointSnitch: enabled
>>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>> Schema versions:
>>> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>>
>>>
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>>
>>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
>>> wrote:
>>>
 Hi folks,

 What about adding new keyspaces in the existing cluster, test_2 with
 the same RF.

 It will use the same logic as the existing kesypace test ? Or I should
 restart nodes and add the new keyspace to the cassandra.yaml ?

 Thanks.

 Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
 écrit :

> Hi,
>
> Managing `initial_token` by yourself will give you more control over
> scale-in and scale-out.
> Let's say you have three node cluster with `num_token: 1`
>
> And your initial range looks like:-
>
> Datacenter: datacenter1
> ==
> AddressRackStatus State   LoadOwns
>Token
>
>3074457345618258602
>
> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>-9223372036854775808
> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>-3074457345618258603
> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>3074457345618258602
>
> Now let's say you want to scale out the cluster to twice the current
> throughput(means you are adding 3 more nodes)
>
> If you are using AWS EBS volumes then you 

Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Jeff Jirsa
Picking an ideal allocation for N seed nodes and M vnodes per seed is probably 
something we should add as a little python script or similar in /tools/ to make 
this easier. Then let the auto allocation stuff kick in after that.


> On May 5, 2019, at 8:23 AM, Jon Haddad  wrote:
> 
> I mean you'd want to set up the initial tokens for the first 3 nodes
> of your cluster, which are usually the seed nodes.
> 
> 
> On Sat, May 4, 2019 at 8:31 PM onmstester onmstester
>  wrote:
>> 
>> So do you mean setting tokens for only one node (one of the seed node) is 
>> fair enough?
>> I can not see any problem with this mechanism (only one manual token 
>> assignment at cluster set up), but the article was also trying to set up a 
>> balanced cluster and the way that it insist on doing manual token assignment 
>> for multiple seed nodes, confused me.
>> 
>> Sent using Zoho Mail
>> 
>> 
>> 
>>  Forwarded message 
>> From: Jon Haddad 
>> To: 
>> Date: Sat, 04 May 2019 22:10:39 +0430
>> Subject: Re: How to set up a cluster with allocate_tokens_for_keyspace?
>>  Forwarded message 
>> 
>> That line is only relevant for when you're starting your cluster and
>> you need to define your initial tokens in a non-random way. Random
>> token distribution doesn't work very well when you only use 4 tokens.
>> 
>> Once you get the cluster set up you don't need to specify tokens
>> anymore, you can just use allocate_tokens_for_keyspace.
>> 
>> On Sat, May 4, 2019 at 2:14 AM onmstester onmstester
>>  wrote:
>>> 
>>> I just read this article by tlp:
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>> 
>>> Noticed that:
> We will need to set the tokens for the seed nodes in each rack manually. 
> This is to prevent each node from randomly calculating its own token 
> ranges
>>> 
>>> But until now, i was using this recommendation to setup a new cluster:
> 
>>> 
>>> You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 
>>> 4) * i) - 2**63) for i in range(4)])'
>>> 
>>> 
>>> After you fire up the first seed, create a keyspace using RF=3 (or whatever 
>>> you're planning on using) and set allocate_tokens_for_keyspace to that 
>>> keyspace in your config, and join the rest of the nodes. That gives even
>>> distribution.
>>> 
>>> I've defined plenty of racks in my cluster (and only 3 seed nodes), should 
>>> i have a seed node per rack and use initial_token for all of the seed nodes 
>>> or just one seed node with inital_token would be ok?
>>> 
>>> Best Regards
>>> 
>>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> 
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Re: How to set up a cluster with allocate_tokens_for_keyspace?

2019-05-05 Thread Jon Haddad
I mean you'd want to set up the initial tokens for the first 3 nodes
of your cluster, which are usually the seed nodes.


On Sat, May 4, 2019 at 8:31 PM onmstester onmstester
 wrote:
>
> So do you mean setting tokens for only one node (one of the seed node) is 
> fair enough?
> I can not see any problem with this mechanism (only one manual token 
> assignment at cluster set up), but the article was also trying to set up a 
> balanced cluster and the way that it insist on doing manual token assignment 
> for multiple seed nodes, confused me.
>
> Sent using Zoho Mail
>
>
>
>  Forwarded message 
> From: Jon Haddad 
> To: 
> Date: Sat, 04 May 2019 22:10:39 +0430
> Subject: Re: How to set up a cluster with allocate_tokens_for_keyspace?
>  Forwarded message 
>
> That line is only relevant for when you're starting your cluster and
> you need to define your initial tokens in a non-random way. Random
> token distribution doesn't work very well when you only use 4 tokens.
>
> Once you get the cluster set up you don't need to specify tokens
> anymore, you can just use allocate_tokens_for_keyspace.
>
> On Sat, May 4, 2019 at 2:14 AM onmstester onmstester
>  wrote:
> >
> > I just read this article by tlp:
> > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> >
> > Noticed that:
> > >>We will need to set the tokens for the seed nodes in each rack manually. 
> > >>This is to prevent each node from randomly calculating its own token 
> > >>ranges
> >
> > But until now, i was using this recommendation to setup a new cluster:
> > >>
> >
> > You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 
> > 4) * i) - 2**63) for i in range(4)])'
> >
> >
> > After you fire up the first seed, create a keyspace using RF=3 (or whatever 
> > you're planning on using) and set allocate_tokens_for_keyspace to that 
> > keyspace in your config, and join the rest of the nodes. That gives even
> > distribution.
> >
> > I've defined plenty of racks in my cluster (and only 3 seed nodes), should 
> > i have a seed node per rack and use initial_token for all of the seed nodes 
> > or just one seed node with inital_token would be ok?
> >
> > Best Regards
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Priority in IN () cqlsh comand

2019-05-05 Thread Jon Haddad
Do separate queries for each partition you want.  There's no benefit
in using the IN() clause here, and performance is significantly worse
with multi-partition IN(), especially if the partitions are small.

On Sun, May 5, 2019 at 4:52 AM Soheil Pourbafrani  wrote:
>
> Hi,
>
> I want to run cqlsh query on cassandra table using IN
>
> SELECT * from data WHERE nid = 'value' AND mm IN (201905,201904) AND tid 
> = 'value2' AND ts >= 155639466 AND ts <= 155699946 ;
>
> The nid and mm columns are partition key and the ts is clustering key.
> The problem is cassandra didn't care about the order of the IN List and 
> always return 201904 partition data first and after that it return 201905 
> partition data, but I wanted to 201905 partition data to come first.
>
> Is there any solution for this?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: nodetool repair failing with "Validation failed in /X.X.X.X

2019-05-05 Thread shalom sagges
Hi Rhys,

I encountered this error after adding new SSTables to a cluster and running
nodetool refresh (v3.0.12).
The refresh worked, but after starting repairs on the cluster, I got the
"Validation failed in /X.X.X.X" error on the remote DC.
A rolling restart solved the issue for me.

Hope this helps!



On Sat, May 4, 2019 at 3:58 PM Rhys Campbell
 wrote:

>
> > Hello,
> >
> > I’m having issues running repair on an Apache Cassandra Cluster. I’m
> getting "Failed creating a merkle tree“ errors on the replication partner
> nodes. Anyone have any experience of this? I am running 2.2.13.
> >
> > Further details here…
> https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-15109?filter=allopenissues
> >
> > Best,
> >
> > Rhys
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Priority in IN () cqlsh comand

2019-05-05 Thread Soheil Pourbafrani
Hi,

I want to run cqlsh query on cassandra table using IN

SELECT * from data WHERE nid = 'value' AND mm IN (201905,201904) AND
tid = 'value2' AND ts >= 155639466 AND ts <= 155699946 ;

The nid and mm columns are partition key and the ts is clustering key.
The problem is cassandra didn't care about the order of the IN List and
always return 201904 partition data first and after that it return 201905
partition data, but I wanted to 201905 partition data to come first.

Is there any solution for this?