Re: Cassandra on K8S

2020-08-03 Thread manish khandelwal
I am asking this because the only case where I see IP swap can occur is
when two Cassandra nodes are running on the same K8S host node. I am
evaluating how safe it is two run two Cassandra nodes on a single K8S host
node.

*Totally Agree that swap is not the right word but one node still taking
other nodes IP*

Regards
Manish




On Tue, Aug 4, 2020 at 10:07 AM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> But again if
> Some Cassandra node (pod) with particular IP X is down,
> Second Cassandra node (pod) tries to take the IP X of first Cassandra node,
> Second Cassandra node should fail to join the cluster as the Cassandra
> cluster will complain that IP X is already occupied.
> In that sense actual swap of IPs can occur between two nodes and data
> issues should not occur.
>
> I am asking this because the only case where I see IP swap can occur is
> when two Cassandra nodes are running on the same K8S host node. I am
> evaluating how safe it is two run two Cassandra nodes on a single K8S host
> node.
>
> Regards
> Manish
>
>
>
> On Tue, Aug 4, 2020 at 9:01 AM Christopher Bradford 
> wrote:
>
>> In *most* k8s environments each Kubernetes worker receives its own
>> dedicated CIDR range from the cluster’s CIDR space for allocating pod IP
>> addresses. The issue described can occur when a k8s worker goes down then
>> comes back up and the pods are rescheduled where either pod starts up with
>> another pods previously used IP.
>>
>> They don’t necessarily have to swap 1:1 (ie one pod could use the other’s
>> previous address while that pod receives a new address). Additionally it’s
>> not a race condition of which container starts first. The k8s scheduler and
>> kubelet daemon assign IPs to pods.
>>
>> On Mon, Aug 3, 2020 at 11:14 PM manish khandelwal <
>> manishkhandelwa...@gmail.com> wrote:
>>
>>> I have started reading about how to deploy Cassandra with K8S. But as I
 read more I feel there are a lot of challenges in running Cassandra on K8s.
 Some of the challenges which I feel are

 1. POD IPs identification - If the pods go down and when they come up
 their IPs change, how is it handled as we are dependent on IPs of Cassandra
 nodes for internode as well client server communication.

>>>
>>> Strictly safe to change an IP to an IP that is unused.
>>> Strictly unsafe to use an IP that's already in the cluster (so if two
>>> pods go down, having the first pod that comes up grab the IP of the second
>>> pod is strictly dangerous and will violate consistency and maybe lose
>>> data).
>>>
>>> *For point 2 (Strictly unsafe to use an IP) , if the first pod grabs the
>>> IP of the second node then it should not be able to join the cluster.*
>>> *So can IPs still be swapped?  *
>>> *When and how this IP swap can occur?*
>>>
>>> Regards
>>> Manish
>>>
>>> On Mon, Jul 6, 2020 at 10:40 PM Jeff Jirsa  wrote:
>>>


 On Mon, Jul 6, 2020 at 10:01 AM manish khandelwal <
 manishkhandelwa...@gmail.com> wrote:

> I have started reading about how to deploy Cassandra with K8S. But as
> I read more I feel there are a lot of challenges in running Cassandra on
> K8s. Some of the challenges which I feel are
>
> 1. POD IPs identification - If the pods go down and when they come up
> their IPs change, how is it handled as we are dependent on IPs of 
> Cassandra
> nodes for internode as well client server communication.
>

 Strictly safe to change an IP to an IP that is unused.
 Strictly unsafe to use an IP that's already in the cluster (so if two
 pods go down, having the first pod that comes up grab the IP of the second
 pod is strictly dangerous and will violate consistency and maybe lose
 data).


>
> 2. A K8S node can host  a single pod. This is being done so that even
> if the host goes down we have only one pod down case. With multiple pods 
> on
> a single host there is a risk of traffic failures as consistency might not
> be achieved. But if we keep two pods of the same rack on a single host 
> then
> are we safe or is there  any unknown risk?
>

 This sounds like you're asking if rack aware snitches protect you from
 concurrent pods going down. Make sure you're using a rack aware snitch.


>
> 3. Seed discovery? Again as an extension of point 1, since IPs can
> change, how we can manage seeds.
>

 Either use DNS instead of static IPs, or use a seed provider that
 handles IPs changing.


>
> 4. Also I read a lot of use of Cassandra operators for maintaining a
> Cassandra cluster on Kubernetes. I think that Cassandra Operator is like a
> robot (automated admin) which works and acts like a norma admin will work.
> I want to understand that how important is Cassandra operator and what if
> we go on to production without one?
>
> Regards
> Manish
>
 --
>>
>> 

Re: Cassandra on K8S

2020-08-03 Thread manish khandelwal
But again if
Some Cassandra node (pod) with particular IP X is down,
Second Cassandra node (pod) tries to take the IP X of first Cassandra node,
Second Cassandra node should fail to join the cluster as the Cassandra
cluster will complain that IP X is already occupied.
In that sense actual swap of IPs can occur between two nodes and data
issues should not occur.

I am asking this because the only case where I see IP swap can occur is
when two Cassandra nodes are running on the same K8S host node. I am
evaluating how safe it is two run two Cassandra nodes on a single K8S host
node.

Regards
Manish



On Tue, Aug 4, 2020 at 9:01 AM Christopher Bradford 
wrote:

> In *most* k8s environments each Kubernetes worker receives its own
> dedicated CIDR range from the cluster’s CIDR space for allocating pod IP
> addresses. The issue described can occur when a k8s worker goes down then
> comes back up and the pods are rescheduled where either pod starts up with
> another pods previously used IP.
>
> They don’t necessarily have to swap 1:1 (ie one pod could use the other’s
> previous address while that pod receives a new address). Additionally it’s
> not a race condition of which container starts first. The k8s scheduler and
> kubelet daemon assign IPs to pods.
>
> On Mon, Aug 3, 2020 at 11:14 PM manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> I have started reading about how to deploy Cassandra with K8S. But as I
>>> read more I feel there are a lot of challenges in running Cassandra on K8s.
>>> Some of the challenges which I feel are
>>>
>>> 1. POD IPs identification - If the pods go down and when they come up
>>> their IPs change, how is it handled as we are dependent on IPs of Cassandra
>>> nodes for internode as well client server communication.
>>>
>>
>> Strictly safe to change an IP to an IP that is unused.
>> Strictly unsafe to use an IP that's already in the cluster (so if two
>> pods go down, having the first pod that comes up grab the IP of the second
>> pod is strictly dangerous and will violate consistency and maybe lose
>> data).
>>
>> *For point 2 (Strictly unsafe to use an IP) , if the first pod grabs the
>> IP of the second node then it should not be able to join the cluster.*
>> *So can IPs still be swapped?  *
>> *When and how this IP swap can occur?*
>>
>> Regards
>> Manish
>>
>> On Mon, Jul 6, 2020 at 10:40 PM Jeff Jirsa  wrote:
>>
>>>
>>>
>>> On Mon, Jul 6, 2020 at 10:01 AM manish khandelwal <
>>> manishkhandelwa...@gmail.com> wrote:
>>>
 I have started reading about how to deploy Cassandra with K8S. But as I
 read more I feel there are a lot of challenges in running Cassandra on K8s.
 Some of the challenges which I feel are

 1. POD IPs identification - If the pods go down and when they come up
 their IPs change, how is it handled as we are dependent on IPs of Cassandra
 nodes for internode as well client server communication.

>>>
>>> Strictly safe to change an IP to an IP that is unused.
>>> Strictly unsafe to use an IP that's already in the cluster (so if two
>>> pods go down, having the first pod that comes up grab the IP of the second
>>> pod is strictly dangerous and will violate consistency and maybe lose
>>> data).
>>>
>>>

 2. A K8S node can host  a single pod. This is being done so that even
 if the host goes down we have only one pod down case. With multiple pods on
 a single host there is a risk of traffic failures as consistency might not
 be achieved. But if we keep two pods of the same rack on a single host then
 are we safe or is there  any unknown risk?

>>>
>>> This sounds like you're asking if rack aware snitches protect you from
>>> concurrent pods going down. Make sure you're using a rack aware snitch.
>>>
>>>

 3. Seed discovery? Again as an extension of point 1, since IPs can
 change, how we can manage seeds.

>>>
>>> Either use DNS instead of static IPs, or use a seed provider that
>>> handles IPs changing.
>>>
>>>

 4. Also I read a lot of use of Cassandra operators for maintaining a
 Cassandra cluster on Kubernetes. I think that Cassandra Operator is like a
 robot (automated admin) which works and acts like a norma admin will work.
 I want to understand that how important is Cassandra operator and what if
 we go on to production without one?

 Regards
 Manish

>>> --
>
> Christopher Bradford
>
>


Re: Cassandra on K8S

2020-08-03 Thread Christopher Bradford
In *most* k8s environments each Kubernetes worker receives its own
dedicated CIDR range from the cluster’s CIDR space for allocating pod IP
addresses. The issue described can occur when a k8s worker goes down then
comes back up and the pods are rescheduled where either pod starts up with
another pods previously used IP.

They don’t necessarily have to swap 1:1 (ie one pod could use the other’s
previous address while that pod receives a new address). Additionally it’s
not a race condition of which container starts first. The k8s scheduler and
kubelet daemon assign IPs to pods.

On Mon, Aug 3, 2020 at 11:14 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> I have started reading about how to deploy Cassandra with K8S. But as I
>> read more I feel there are a lot of challenges in running Cassandra on K8s.
>> Some of the challenges which I feel are
>>
>> 1. POD IPs identification - If the pods go down and when they come up
>> their IPs change, how is it handled as we are dependent on IPs of Cassandra
>> nodes for internode as well client server communication.
>>
>
> Strictly safe to change an IP to an IP that is unused.
> Strictly unsafe to use an IP that's already in the cluster (so if two pods
> go down, having the first pod that comes up grab the IP of the second pod
> is strictly dangerous and will violate consistency and maybe lose data).
>
> *For point 2 (Strictly unsafe to use an IP) , if the first pod grabs the
> IP of the second node then it should not be able to join the cluster.*
> *So can IPs still be swapped?  *
> *When and how this IP swap can occur?*
>
> Regards
> Manish
>
> On Mon, Jul 6, 2020 at 10:40 PM Jeff Jirsa  wrote:
>
>>
>>
>> On Mon, Jul 6, 2020 at 10:01 AM manish khandelwal <
>> manishkhandelwa...@gmail.com> wrote:
>>
>>> I have started reading about how to deploy Cassandra with K8S. But as I
>>> read more I feel there are a lot of challenges in running Cassandra on K8s.
>>> Some of the challenges which I feel are
>>>
>>> 1. POD IPs identification - If the pods go down and when they come up
>>> their IPs change, how is it handled as we are dependent on IPs of Cassandra
>>> nodes for internode as well client server communication.
>>>
>>
>> Strictly safe to change an IP to an IP that is unused.
>> Strictly unsafe to use an IP that's already in the cluster (so if two
>> pods go down, having the first pod that comes up grab the IP of the second
>> pod is strictly dangerous and will violate consistency and maybe lose
>> data).
>>
>>
>>>
>>> 2. A K8S node can host  a single pod. This is being done so that even if
>>> the host goes down we have only one pod down case. With multiple pods on a
>>> single host there is a risk of traffic failures as consistency might not be
>>> achieved. But if we keep two pods of the same rack on a single host then
>>> are we safe or is there  any unknown risk?
>>>
>>
>> This sounds like you're asking if rack aware snitches protect you from
>> concurrent pods going down. Make sure you're using a rack aware snitch.
>>
>>
>>>
>>> 3. Seed discovery? Again as an extension of point 1, since IPs can
>>> change, how we can manage seeds.
>>>
>>
>> Either use DNS instead of static IPs, or use a seed provider that handles
>> IPs changing.
>>
>>
>>>
>>> 4. Also I read a lot of use of Cassandra operators for maintaining a
>>> Cassandra cluster on Kubernetes. I think that Cassandra Operator is like a
>>> robot (automated admin) which works and acts like a norma admin will work.
>>> I want to understand that how important is Cassandra operator and what if
>>> we go on to production without one?
>>>
>>> Regards
>>> Manish
>>>
>> --

Christopher Bradford


Re: Cassandra on K8S

2020-08-03 Thread manish khandelwal
>
> I have started reading about how to deploy Cassandra with K8S. But as I
> read more I feel there are a lot of challenges in running Cassandra on K8s.
> Some of the challenges which I feel are
>
> 1. POD IPs identification - If the pods go down and when they come up
> their IPs change, how is it handled as we are dependent on IPs of Cassandra
> nodes for internode as well client server communication.
>

Strictly safe to change an IP to an IP that is unused.
Strictly unsafe to use an IP that's already in the cluster (so if two pods
go down, having the first pod that comes up grab the IP of the second pod
is strictly dangerous and will violate consistency and maybe lose data).

*For point 2 (Strictly unsafe to use an IP) , if the first pod grabs the IP
of the second node then it should not be able to join the cluster.*
*So can IPs still be swapped?  *
*When and how this IP swap can occur?*

Regards
Manish

On Mon, Jul 6, 2020 at 10:40 PM Jeff Jirsa  wrote:

>
>
> On Mon, Jul 6, 2020 at 10:01 AM manish khandelwal <
> manishkhandelwa...@gmail.com> wrote:
>
>> I have started reading about how to deploy Cassandra with K8S. But as I
>> read more I feel there are a lot of challenges in running Cassandra on K8s.
>> Some of the challenges which I feel are
>>
>> 1. POD IPs identification - If the pods go down and when they come up
>> their IPs change, how is it handled as we are dependent on IPs of Cassandra
>> nodes for internode as well client server communication.
>>
>
> Strictly safe to change an IP to an IP that is unused.
> Strictly unsafe to use an IP that's already in the cluster (so if two pods
> go down, having the first pod that comes up grab the IP of the second pod
> is strictly dangerous and will violate consistency and maybe lose data).
>
>
>>
>> 2. A K8S node can host  a single pod. This is being done so that even if
>> the host goes down we have only one pod down case. With multiple pods on a
>> single host there is a risk of traffic failures as consistency might not be
>> achieved. But if we keep two pods of the same rack on a single host then
>> are we safe or is there  any unknown risk?
>>
>
> This sounds like you're asking if rack aware snitches protect you from
> concurrent pods going down. Make sure you're using a rack aware snitch.
>
>
>>
>> 3. Seed discovery? Again as an extension of point 1, since IPs can
>> change, how we can manage seeds.
>>
>
> Either use DNS instead of static IPs, or use a seed provider that handles
> IPs changing.
>
>
>>
>> 4. Also I read a lot of use of Cassandra operators for maintaining a
>> Cassandra cluster on Kubernetes. I think that Cassandra Operator is like a
>> robot (automated admin) which works and acts like a norma admin will work.
>> I want to understand that how important is Cassandra operator and what if
>> we go on to production without one?
>>
>> Regards
>> Manish
>>
>


Re: Re: streaming stuck on joining a node with TBs of data

2020-08-03 Thread Jeff Jirsa
Memtable really isn't involved here, each data file is copied over as-is
and turned into a new data file, it doesn't read into the memtable (though
it does deserialize and re-serialize, which temporarily has it in memory,
but isn't in the memtable itself).

You can cut down on the number of data files copied in by using fewer
vnodes, or by changing your compaction parameters (e.g. if you're using
LCS, change sstable size from 160M to something higher), but there's no
magic to join / compact those data files on the sending side before sending.


On Mon, Aug 3, 2020 at 4:15 AM onmstester onmstester
 wrote:

> IMHO (reading system.log) each streamed-in file from any node would be
> write down as a separate sstable to the disk and won't be wait in memtable
> until enough amount of memtable has been created inside memory, so there
> would be more compactions because of multiple small sstables. Is there any
> configuration in cassandra to force streamed-in to pass memtable-sstable
> cycle, to have bigger sstables at first place?
>
> Sent using Zoho Mail 
>
>
>  Forwarded message 
> From: onmstester onmstester 
> To: "user"
> Date: Sun, 02 Aug 2020 08:35:30 +0430
> Subject: Re: streaming stuck on joining a node with TBs of data
>  Forwarded message 
>
> Thanks Jeff,
>
> Already used netstats and it only shows that streaming from a single node
> remained and stuck and bunch of dropped messages, next time i will check
> tpstats too.
> Currently i stopped the joining/stucked node, make the auto_bootstrap
> false and started the node and its UN now, is this OK too?
>
> What about streaming tables one by one, any idea?
>
> Sent using Zoho Mail 
>
>
>  On Sat, 01 Aug 2020 21:44:09 +0430 *Jeff Jirsa  >* wrote 
>
>
> Nodetool tpstats and netstats should give you a hint why it’s not joining
>
> If you don’t care about consistency and you just want it joined in its
> current form (which is likely strictly incorrect but I get it), “nodetool
> disablegossip && nodetool enablegossip” in rapid succession (must be less
> than 30 seconds in between commands) will PROBABLY change it from joining
> to normal (unclean, unsafe, do this at your own risk).
>
>
> On Jul 31, 2020, at 11:46 PM, onmstester onmstester <
> onmstes...@zoho.com.invalid> wrote:
>
> 
> No Secondary index, No SASI, No materialized view
>
> Sent using Zoho Mail 
>
>
>  On Sat, 01 Aug 2020 11:02:54 +0430 *Jeff Jirsa  >* wrote 
>
> Are there secondary indices involved?
>
> On Jul 31, 2020, at 10:51 PM, onmstester onmstester <
> onmstes...@zoho.com.invalid> wrote:
>
> 
> Hi,
>
> I'm going to join multiple new nodes to already existed and running
> cluster. Each node should stream in >2TB of data, and it took a few days
> (with 500Mb streaming) to almost get finished. But it stuck on streaming-in
> from one final node, but i can not see any bottleneck on any side (source
> or destination node), the only problem is 400 pending compactions on
> joining node, which i disabled auto_compaction, but no improvement.
>
> 1. How can i safely stop streaming/joining the new node and make it UN,
> then run repair on the node?
> 2. On bootstrap a new node, multiple tables would be streamed-in
> simultaneously and i think that this would increase number of compactions
> in compare with a scenario that "the joining node first stream-in one table
> then switch to another one and etc". Am i right and this would decrease
> compactions? If so, is there a config or hack in cassandra to force that?
>
>
> Sent using Zoho Mail 
>
>
>
>
>
>
>
>
>
>


Re: many instances of org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier$1 on the heap

2020-08-03 Thread jelmer
It did look like there where repairs running at the time. The
LiveSSTableCount for the entire node is about 2200 tables, for the keyspace
that was being repaired its just 150

We run cassandra 3.11.6 so we should be unaffected by  cassandra-14096

We use http://cassandra-reaper.io/ for the repairs



On Sat, 1 Aug 2020 at 01:49, Erick Ramirez 
wrote:

> I don't have specific experience relating to InstanceTidier but when I
> saw this, I immediately thought of repairs blowing up the heap. 40K
> instances indicates to me that you have thousands of SSTables -- are they
> tiny (like 1MB or less)? Otherwise, are they dense nodes (~1TB or more)?
>
> How do you run repairs? I'm wondering if it's possible that there are
> multiple repairs running in parallel like a cron job kicking in while the
> previous repair is still running.
>
> You didn't specify your C* version but my guess is that it's pre-3.11.5.
> FWIW the repair issue I'm referring to is CASSANDRA-14096 [1].
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-14096
>


Fwd: Re: streaming stuck on joining a node with TBs of data

2020-08-03 Thread onmstester onmstester
IMHO (reading system.log) each streamed-in file from any node would be write 
down as a separate sstable to the disk and won't be wait in memtable until 
enough amount of memtable has been created inside memory, so there would be 
more compactions because of multiple small sstables. Is there any configuration 
in cassandra to force streamed-in to pass memtable-sstable cycle, to have 
bigger sstables at first place?



Sent using https://www.zoho.com/mail/






 Forwarded message 
From: onmstester onmstester 
To: "user"
Date: Sun, 02 Aug 2020 08:35:30 +0430
Subject: Re: streaming stuck on joining a node with TBs of data
 Forwarded message 



Thanks Jeff,



Already used netstats and it only shows that streaming from a single node 
remained and stuck and bunch of dropped messages, next time i will check 
tpstats too.

Currently i stopped the joining/stucked node, make the auto_bootstrap false and 
started the node and its UN now, is this OK too?



What about streaming tables one by one, any idea?



Sent using https://www.zoho.com/mail/






 On Sat, 01 Aug 2020 21:44:09 +0430 Jeff Jirsa  
wrote 





Nodetool tpstats and netstats should give you a hint why it’s not joining



If you don’t care about consistency and you just want it joined in its current 
form (which is likely strictly incorrect but I get it), “nodetool disablegossip 
&& nodetool enablegossip” in rapid succession (must be less than 30 seconds in 
between commands) will PROBABLY change it from joining to normal (unclean, 
unsafe, do this at your own risk).





On Jul 31, 2020, at 11:46 PM, onmstester onmstester 
 wrote:





No Secondary index, No SASI, No materialized view



Sent using https://www.zoho.com/mail/






 On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa  
wrote 



Are there secondary indices involved? 



On Jul 31, 2020, at 10:51 PM, onmstester onmstester 
 wrote:





Hi,



I'm going to join multiple new nodes to already existed and running cluster. 
Each node should stream in >2TB of data, and it took a few days (with 500Mb 
streaming) to almost get finished. But it stuck on streaming-in from one final 
node, but i can not see any bottleneck on any side (source or destination 
node), the only problem is 400 pending compactions on joining node, which i 
disabled auto_compaction, but no improvement.



1. How can i safely stop streaming/joining the new node and make it UN, then 
run repair on the node?

2. On bootstrap a new node, multiple tables would be streamed-in simultaneously 
and i think that this would increase number of compactions in compare with a 
scenario that "the joining node first stream-in one table then switch to 
another one and etc". Am i right and this would decrease compactions? If so, is 
there a config or hack in cassandra to force that?





Sent using https://www.zoho.com/mail/