Re: Leader Election Issues During Rolling TLS Migration

Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) Sat, 29 Mar 2025 18:52:41 -0700

Hi David,

I appreciate the partial answers and apologize for the
late reply.

I did read up a little bit on Istio/Envoy/iptables and
it does sound interesting although it would be a massive
architectural shift for us with certain other maintenance 
costs + learning curve. There are definitely benefits ... 
I've already ran into situations where I wished I had a
service mesh so that, for instance, switching Solr 5-8 clouds
to TLS worked the same as 9 but alas..

Just for my own understanding, each Solr node in your set-up
has its own envoy sidecar and the cloud is technically running
in http-mode because the envoy-to-envoy traffic is already
TLS? I assume then you must override the solr host names of
each node to some fqdn that istio/envoy can resolve?

At any rate, after hearing your take I think I'll skip the
rabbit hole of figuring out why the "Solr-native" TLS has
this weird base_url corruption during transition since this
isn't the most modern way of doing it anyway and anyone converting
to TLS should be encouraged to do so at the infra level.

Luke

From: dev@solr.apache.org At: 02/27/25 17:41:10 UTC-5:00To:  dev@solr.apache.org
Subject: Re: Leader Election Issues During Rolling TLS Migration

Hi Luke,

I have partial answers...

On Wed, Feb 26, 2025 at 2:34 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) <
lkotzanie...@bloomberg.net> wrote:

> 1. Is our proposed patch a reasonable solution, or are we overlooking
> other critical factors?
>

Your proposal at least sounds safe / low risk.  Hopefully others with
experience in the scenario you describe may pipe in.

I'd like to take this opportunity to mention that a containerized service
(Solr or whatever) doesn't need to actually intrinsically support TLS at
all for you to get the benefits of TLS.  Especially if you're interested in
mTLS.  You can get there by embracing a Istio Service Mesh:
https://istio.io/latest/about/service-mesh/ It side steps Solr's TLS
aspects, and ultimately simplifies and harmonizes everything, especially in
the bigger picture with multiple services.  It's invisible to services on
the mesh.

> 2. Is it generally advisable to co-locate multiple collections on the same
> Solr processes? Beyond this limited corner case, we understand there may be
> bigger potential problems of such a set-up. For instance, non-forward
> compatible Lucene codec upgrades (e.g., older tlogs attempting to read
> segments written with newer codecs) can prove problematic.
>

There's no concern of mixing collections for the same node; we (Salesforce)
run with crazy numbers, albeit with some Solr hacks but you can get to
single digit thousands without much problem in pure SolrCloud.

The codec upgrade issue is a nuisance for an in-place upgrade.  Some noisy
logs / errors, a temporary loss of indexing ability, but that's it.  We
went through this across hundreds of clusters in-place and nothing
persisted past the rolling restart or stopped the rolling restart.  Ideally
Solr would be version-aware in its leadership choice.  Oh, how I wish to
change the shard leadership code.

~ David

Re: Leader Election Issues During Rolling TLS Migration

Reply via email to