Re: SIP-18: A Solr Kubernetes Module for native integration

Houston Putman Thu, 20 Apr 2023 14:27:35 -0700

Thanks for the questions Jason!

So the general idea is that we'd add a Solr contrib/module, and that
> module would have a dep on some sort of Kubernetes client so it could
> manage certain Solr entities (e.g. security.json, configsets, etc.) as
> Kubernetes resources (configmaps, etc.).  Am I understanding that
> right?
>

Yes, absolutely. And possibly other things, like leverage Kubernetes'
secrets managements to manage
credentials for users. (Auto-import BasicAuth secrets with certain labels,
integrate with Kubernetes ServiceAccounts, etc.)

But yeah, generally the idea is to use Kubernetes state instead of
Zookeeper state for certain features.

One place there might be room for improvement in the writeup so far is
> around the motivation/value-prop for some of these Solr->Kubernetes
> integrations.  The value in some integrations (e.g.
> KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> but others are a little less clear and could use spelled out
> explicitly IMO.  e.g. What's the benefit of storing security.json or
> configsets in Kubernetes configmaps over ZooKeeper?
>

This is a great question.

Generally Solr has fairly good tool support for managing various things in
Zookeeper.

The "zkCli.sh" script and various "bin/solr" commands allow users to easily
manage their Zookeeper state to setup
Solr to run the way they need it to. This works very well for users running
Solr on bare-metal, and manually running these commands.

However, running these commands in Kubernetes is not very convenient and it
does not really jive with
the Kubernetes' idempotent model. Basically there isn't a good or easy way
to run to run the
solr/zk setup commands before a SolrCloud is created. And when we do it in
things like an "initContainer",
the commands have to be run every time a solr process is started (or
restarted). This isn't really convenient
and adds complexity that really makes running Solr on Kubernetes much less
appealing.

Another thing is state management. So let's say that the Solr Operator
wants to enable auth by default when running Solr.
It has to create a security.json for Solr to use, and generate passwords
and secrets for users to use.
However, it also needs to setup a user & password for itself (the operator)
to use to interact with the cluster.
But that's ok, it does it, and it can easily upload this file to zookeeper
in the initContainer if no security.json already exists.

However we need to allow users to update this file themselves to add more
users, and do other stuff. So basically we
can't let the Solr Operator make any changes to this file. So even if a
user decides that they want to change the security.json secret
they passed in the SolrCloud, the operator can't make that change happen,
since it can't overwrite what already exists in zookeeper.
This will always be a problem when there are two "sources of truth". One
has to be prioritized.

If we allow the security.json to be loaded from a kubernetes secret, then
the secret that the user provides is the
single source of truth. So no matter if the security.json is changed
through the security UI, the changes will be reflected in
the kubernetes secret. So users can be free to overwrite that secret if
they want to, given that everyone knows its the current
accepted state of the security.json file.

The exact same issues exist with ConfigMaps. Many Solr Operator users want
to manage their configMaps through
Kubernetes, just like they manage their SolrClouds. It makes sense, keep
all of your Solr infra managed together.
However the operator cannot keep the configSets managed in Zookeeper
updated with the configSets managed
via Kube ConfigSets. It's two sources of truth.

*TLDR*: Solr has many command line utilities that work well to setup Solr
when its running on bare metal or a VM.
However, these solutions do not work well in a cloud system like
Kubernetes. If we try to make these things
easier to setup in Kubernetes, it ultimately results in 2 sources of truth
(Kubernetes and Zookeeper). If we make
plugins that allow to load in these settings from Kubernetes instead of
Zookeeper, we are back down to 1 source
of truth. And this single source of truth (obviously) works well in
Kubernetes, because they are native Kubernetes resources.

- Houston

On Tue, Apr 11, 2023 at 2:36 PM Jason Gerlowski <[email protected]>
wrote:

> Hi Houston,
>
> So the general idea is that we'd add a Solr contrib/module, and that
> module would have a dep on some sort of Kubernetes client so it could
> manage certain Solr entities (e.g. security.json, configsets, etc.) as
> Kubernetes resources (configmaps, etc.).  Am I understanding that
> right?
>
> > Please let me know if I can explain more, or how I can make the SIP page
> better.
>
> One place there might be room for improvement in the writeup so far is
> around the motivation/value-prop for some of these Solr->Kubernetes
> integrations.  The value in some integrations (e.g.
> KubernetesSSLCredentialsProvider) is relatively self-evident I think,
> but others are a little less clear and could use spelled out
> explicitly IMO.  e.g. What's the benefit of storing security.json or
> configsets in Kubernetes configmaps over ZooKeeper?
>
> Best,
>
> Jason
>
> On Wed, Apr 5, 2023 at 12:45 PM Houston Putman <[email protected]> wrote:
> >
> > Hey everyone,
> >
> > This is a new SIP, not a duplicate of SIP-17 (Authoscaling on
> Kubernetes),
> > and completely unrelated.
> >
> > Basically there is a lot of very messy logic we do in the Solr Operator
> to
> > bootstrap security and manage various things. This logic must exist
> because
> > Solr has no idea that Kubernetes exists.
> > If we can use Kubernetes APIs to pull in information, instead of relying
> on
> > the Solr Operator to inject that information in hacky-ways, the user
> > experience on Kubernetes is going to get many times better for users
> > wanting to secure their SolrClouds. This will also help us use
> > authorization by default (which we always preach) via the Solr Operator.
> >
> > This SIP is not very filled out because I'm still thinking on various
> > aspects. But in general, we can attack the different plugins one-by-one
> and
> > the SIP can evolve throughout the process. This SIP is very easy to break
> > up, which is nice.
> >
> > Please let me know if I can explain more, or how I can make the SIP page
> > better.
> >
> > - Houston
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: SIP-18: A Solr Kubernetes Module for native integration

Reply via email to