Hi David, I appreciate the partial answers and apologize for the late reply.
I did read up a little bit on Istio/Envoy/iptables and it does sound interesting although it would be a massive architectural shift for us with certain other maintenance costs + learning curve. There are definitely benefits ... I've already ran into situations where I wished I had a service mesh so that, for instance, switching Solr 5-8 clouds to TLS worked the same as 9 but alas.. Just for my own understanding, each Solr node in your set-up has its own envoy sidecar and the cloud is technically running in http-mode because the envoy-to-envoy traffic is already TLS? I assume then you must override the solr host names of each node to some fqdn that istio/envoy can resolve? At any rate, after hearing your take I think I'll skip the rabbit hole of figuring out why the "Solr-native" TLS has this weird base_url corruption during transition since this isn't the most modern way of doing it anyway and anyone converting to TLS should be encouraged to do so at the infra level. Luke From: dev@solr.apache.org At: 02/27/25 17:41:10 UTC-5:00To: dev@solr.apache.org Subject: Re: Leader Election Issues During Rolling TLS Migration Hi Luke, I have partial answers... On Wed, Feb 26, 2025 at 2:34 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) < lkotzanie...@bloomberg.net> wrote: > 1. Is our proposed patch a reasonable solution, or are we overlooking > other critical factors? > Your proposal at least sounds safe / low risk. Hopefully others with experience in the scenario you describe may pipe in. I'd like to take this opportunity to mention that a containerized service (Solr or whatever) doesn't need to actually intrinsically support TLS at all for you to get the benefits of TLS. Especially if you're interested in mTLS. You can get there by embracing a Istio Service Mesh: https://istio.io/latest/about/service-mesh/ It side steps Solr's TLS aspects, and ultimately simplifies and harmonizes everything, especially in the bigger picture with multiple services. It's invisible to services on the mesh. > 2. Is it generally advisable to co-locate multiple collections on the same > Solr processes? Beyond this limited corner case, we understand there may be > bigger potential problems of such a set-up. For instance, non-forward > compatible Lucene codec upgrades (e.g., older tlogs attempting to read > segments written with newer codecs) can prove problematic. > There's no concern of mixing collections for the same node; we (Salesforce) run with crazy numbers, albeit with some Solr hacks but you can get to single digit thousands without much problem in pure SolrCloud. The codec upgrade issue is a nuisance for an in-place upgrade. Some noisy logs / errors, a temporary loss of indexing ability, but that's it. We went through this across hundreds of clusters in-place and nothing persisted past the rolling restart or stopped the rolling restart. Ideally Solr would be version-aware in its leadership choice. Oh, how I wish to change the shard leadership code. ~ David