Re: Kubernetes Operator: Can We Preserve CassKop's Flexibility?

Cyril Scetbon Wed, 07 Oct 2020 20:00:16 -0700

Thank you Tom for your support, as one of the main contributors of CassKop I’m 
happy to see that the efforts we put in it to try to support as many 
configurations as possible is well appreciated.


When we first started to talk about creating a kubernetes operator we always 
mentioned the features that we added and the importance of trying to fulfill 
the needs of every user. All those choices have a reason, a situation that 
happened on production, a configuration that we used to apply to some of our 
clusters or a situation that could potentially happen and that we needed to 
overcome. An example is the fact that IPs could change when a kubernetes node 
restarts, and possibly IPs could be exchanged between 2 nodes of the same 
cluster. We then implemented a detection algorithm  that when it sees it 
happening tries to restart pods which should get new IPs and solve the problem 
https://orange-opensource.github.io/casskop/docs/3_configuration_deployment/9_advanced_configuration#cross-ip-management

The features we tried to add solved use cases that happened on production or 
that could happen due to the environment and we tried to make it as simple and 
intuitive as possible. We put also a lot of efforts in the documentation which 
is not perfect but serves the purpose of explaining and detailing how to use 
CassKop. 

Soon, when we start talking about porting our features, we’ll of course support 
the importance of making it opened (tbh we had in mind to make it supported by 
any recent Cassandra versions and even ScyllaDB) as much as possible, simple, 
configurable and adaptable if possible. Of course not all versions are 
supported even by CassKop cause we make some Jolokia calls and if the JMX bean 
change some important operations could stop working (We check that a datacenter 
has no data replicated to it before decommissioning 
<https://orange-opensource.github.io/casskop/docs/5_operations/1_cluster_operations#updatescaledown>
 it for instance).

I had a few discussions with some of the cass-operator developers and I think 
we understood each other and know that in order for it to be adopted and the 
work to be fruitful no feature should be lost on the way and if there is a 
better way to do things we’ll find it together. Orange also uses CassKop and 
will keep using it as long as the crucial features are not available. We’ll 
also have to find a way to migrate from CassKop to Cass-operator without 
breaking everything. But let’s start walking before running 😉

—
Cyril Scetbon

> On Oct 7, 2020, at 2:23 PM, Tom Offermann <tofferm...@newrelic.com.INVALID> 
> wrote:
> 
> I've been following the discussion about Kubernetes operators with a great
> deal of interest. At New Relic, we're about to move our Cassandra Clusters
> from bare-metal hosts in our datacenters to Kubernetes clusters in AWS, so
> we've been looking closely at the current operators.
> 
> Our goals:
> 
> * Don't write our own operator.
> 
> * Choose the community standard, if possible. If not possible, choose an
> operator with active development, usage, and community.
> 
> * Choose an operator that can work with our existing way of managing
> clusters. Most significantly, at New Relic we do not use virtual nodes in
> our Cassandra clusters. Instead, we continue to assign initial_tokens to
> individual nodes. While we certainly don't expect an operator to support
> this use case by default,  we do hope that an operator will make it
> possible.
> 
> * Don't run a forked version of the operator.
> 
> Both [cass-operator][1] and [CassKop][2] worked very well and we were
> really impressed with both of them. Heading into the evaluation, we
> expected to choose Datastax's cass-operator. Given Datastax's position in
> the Cassandra community, and given that they wrote the most widely-used
> Cassandra clients, they seemed like they would be in the best position to
> provide the community standard.
> 
> We ended up choosing CassKop.
> 
> However, I don't want this to email to be viewed as lobbying for choosing
> one operator over another. I'm excited about the possibility that's
> currently being discussed of merging development efforts and incorporating
> CassKop features into cass-operator.
> 
> I do want to highlight some of the advantages that CassKop currently offers
> for our use case, in the hope that we can preserve those advantages going
> forward. (Or, even improve them!)
> 
> 1. CassKop offers a huge amount of flexibility for modifying Cassandra
> configuration files. If needed, you can swap in your own [bootstrap][3]
> docker image to manipulate the Cassandra configuration files, but
> oftentimes you don't even need to do that. Since CassKop offers the ability
> to define a pre_run.sh script that will run in the bootstrap container, you
> can get pretty far with some shell scripting. In our pre_run.sh, we do
> per-pod configuration to assign initial token values.
> 
> We didn't see an easy way to perform per-pod configuration with
> cass-operator. There is no equivalent pre_run.sh hook in
> [cass-config-builder][4], which is the init container in cass-operator
> that's comparable to CassKop's bootstrap container.
> 
> 2. CassKop is less opinionated about which Cassandra version you want to
> run. My understanding is that cass-config-builder adds a layer of
> abstraction so that it will produce configuration that is tailored to
> certain versions of open-source and DSE Cassandra. Which works great,
> unless you want to run a version of Cassandra that isn't supported. We were
> surprised to see that cass-operator only works with a [handful of Cassandra
> versions][5].
> 
> There didn't seem to be an easy way to use cass-operator with an earlier
> version of Cassandra than those that are officially supported.
> 
> 3. CassKop requires adoption of fewer, less-complex components. CassKop's
> bootstrap container was easier for us to wrap our heads around than
> cass-config-builder. In addition, using cass-operator also required the
> usage of the [management-api][6] sidecar. This means that the adoption of a
> new operator also required the adoption of a new sidecar as well. Perhaps
> this is overstated, but it felt like choosing cass-operator required
> embracing a whole ecosystem, rather than simply an operator.
> 
> Now, if the management-api sidecar was widely used throughout the
> community, then I wouldn't feel the same reluctance to use it. Knowing that
> it was going to be the community standard moving forward would be a big
> help. But, until it achieves that role as the standard, then choosing
> cass-operator means choosing both an operator and a sidecar, when there's
> no guarantee that either of them will become the standard. It's a bigger
> commitment.
> 
> I realize that the concerns we have when choosing an operator may not be
> shared by all. I raise these points with the hope that we can keep them in
> mind. It's possible to build flexibility into a Cassandra operator, so that
> it can be used in ways that deviate from the default, or even used in ways
> that the original authors didn't anticipate.
> 
> I do want to thank both Orange and Datastax for all of the work they've put
> into their operators, as well as everyone here discussing the best way to
> move forward. We are super appreciative and I'm optimistic that some of us
> at New Relic will be in a position soon to be able to contribute to these
> efforts.
> 
> Thanks,
> Tom
> 
> [1]: https://github.com/datastax/cass-operator
> [2]: https://github.com/Orange-OpenSource/casskop
> [3]:
> https://github.com/Orange-OpenSource/casskop/tree/master/docker/bootstrap
> [4]: https://github.com/datastax/cass-config-builder
> [5]:
> https://github.com/datastax/cass-operator/blob/master/operator/deploy/crds/cassandra.datastax.com_cassandradatacenters_crd.yaml#L6029-L6040
> [6]: https://github.com/datastax/management-api-for-apache-cassandra
> 
> -- 
> Tom Offermann
> Lead Software Engineer
> http://newrelic.com

Re: Kubernetes Operator: Can We Preserve CassKop's Flexibility?

Reply via email to