I think it's important to be able to roll out changes to nodes in a way the
user controls (e.g. one node at a time), instead of only having an
all-at-once option.  I really liked Tomas's explanation of the need.  The
same need exists for collections, and Solr satisfies that today via
configSets.  Create my-configset-v2 with some new but maybe buggy stuff,
then roll it out slowly to collections as you wish (by using
MODIFYCOLLECTION) -- needn't be all-at-once.  The package manager should
work with that fine because the package is tied at the configSet level in
params.json.  I don't know how node level handlers are registered in the
package manager, though.  If hypothetically there was an option to tie it
via some file on disk (be it solr.xml or something else), then a user
wanting to do this would be empowered to.

Any way, it appears SIP-11 Uniform cluster-level configuration API
<https://cwiki.apache.org/confluence/display/SOLR/SIP-11+Uniform+cluster-level+configuration+API>
/
SOLR-14843 is where this discussion has gone.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Sep 3, 2020 at 3:03 PM Tomás Fernández Löbbe <tomasflo...@gmail.com>
wrote:

> Thanks Ishan,
> I still don't think it covers the cases very well. The possibilities of
> how that handler could be screwing up things are infinite (it could be
> corrupting local cores, causing OOMs, it could be spawning infinite loops,
> you name it). If the new handler requires initialization that reaches out
> to external system, having a large enough cluster means this can hit
> throttling or even take down something if you start them all atomically.
> I'm fine with Solr supporting atomic deployments with packages and such,
> but I'm not fine with that being the only way to deploy Solr, it may not be
> suitable for all use cases.
>
> Also, your workaround requires a ton of knowledge of Solr APIs and
> internals, vs a simpler and more standard approach where there are two
> versions (Docker images, AMIs, tars, whatever you use): old and new.  Add
> "new" and remove "old" in your preferred way. This is exactly the same
> you'll do when you need to upgrade Solr BTW, so it needs to be handled
> anyways.
>
> On Thu, Sep 3, 2020 at 11:35 AM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Hmmm, interesting point about deliberately changing one solr.xml for
>> testing purposes. To emulate that on a per-node basis you’d have to have
>> something like a “node props” associated with each node, which my instant
>> reaction to is “yuuuucccckkkk”.
>>
>> As far as API only, I’d assumed changes to clusterprops could be either
>> way. If we allow Solr to start with no clusterprops, then the API route
>> would create one. Pros can go ahead and hand-edit one and push it up if
>> they want.
>>
>> In your nightmare scenario, where are the ZK’s located? Are they still
>> running somewhere? Could you hand-edit clusterprops and push it to ZK?
>>
>> I wish everyone would just use Solr the way I think about it ;)
>>
>> > On Sep 3, 2020, at 2:11 PM, Tomás Fernández Löbbe <
>> tomasflo...@gmail.com> wrote:
>> >
>> > I can see that some of these configurations should be moved to
>> clusterporps.json, I don’t believe this is the case for all of them. Some
>> are configurations that are targeting the local node (i.e sharedLib path),
>> some are needed before connecting to ZooKeeper (zk config). Configuration
>> of global handlers and components, while in general you do want to see the
>> same conf across all nodes, you may not want the changes to reflect
>> atomically and instead rely on a phased upgrade (rolling, blue/green, etc),
>> where the conf goes together with the binaries that are being deployed. I
>> also fear that making the configuration of some of these components dynamic
>> means we have to make the code handle them dynamically (i.e. recreate the
>> CollectionsHandler based on callback from ZooKeeper). This would be very
>> hardly used in reality, but all our code needs to be restructured to handle
>> this, I fear this will complicate the code needlessly, and may introduce
>> leaks and races of all kinds. If those components can have configuration
>> that should be dynamic (some toggle, threshold, etc), I’d love to see those
>> as clusterporps, key-value mostly.
>> >
>> > If we were to put this configuration in clusterprops, would that mean
>> that I’m only able to do config changes via API? On a new cluster, do I
>> need to start Solr, make a collections API call to change the collections
>> handler? Or am I supposed to manually change the clusterporps file before
>> starting Solr and push it to Zookeeper (having a file intended for manual
>> edits and API edits is bad IMO)? Maybe via the cli, but still, I’d need to
>> do this for every cluster I create (vs have the solr.xml in my source
>> repository and Docker image, for example). Also I lose the ability to have
>> this configuration in my git repo?
>> >
>> > I'm +1 to keep a node configuration local to the node in the
>> filesystem. Currently, it's solr.xml. I've seen comments about xml
>> difficult to read/write, I think that's personal preference so, while I
>> don't see it that way, I understand lots of people do and things have been
>> moving away to other formats, I'm open to discuss that as a change.
>> >
>> > > However, 1, 2, and 3, are not trivial for a large number of Solr
>> nodes and if they aren’t right diagnosing them can be “challenging”…
>> > In my mind, solr.xml goes with your code. Having it up to date means
>> having all your nodes running the same version of your code. As I said,
>> this is the "desired state" of the cluster, but may not be the case all the
>> time (i.e. during deployments), and that's fine. Depending on how you
>> manage the cluster, you may want to live with different versions for some
>> time (you may have canaries or be doing a blue/green deployment, etc).
>> Realistically speaking, if you have a 500+ node cluster, you must have a
>> system in place to manage configuration and versions, let's not try to bend
>> backwards for a situation that isn't that realistic.
>> >
>> > Let me put an example of things I fear with making these changes
>> atomic. Let's say I want to start using a new, custom HealthCheckHandler
>> implementation, that I have put in a jar (and let's assume the jar is
>> already in all nodes). If I use solr.xml (where one can currently
>> configures this implementation), I can do a phased deployment (yes, this is
>> a restart of all nodes), if the healthcheck handler is buggy and fails
>> request, the nodes with the new code will never show as healthy, so the
>> deployment will likely stop (i.e. if you are using Kubernetes and using
>> probes, those instances will keep restarting, if you use ASG in AWS you can
>> do the same thing). If you make it an atomic change, bye-bye cluster, all
>> nodes will start reporting unhealthy (Kubernetes and ASG will kill all
>> those nodes). Good luck doing API changes to revert now, there is no node
>> to respond to those requests. Hopefully you were using some sort of stable
>> storage because all ephemeral is gone. Bringing back that cluster is going
>> to be a PITA. I have seen similar things happen.
>> >
>> >
>> > On Thu, Sep 3, 2020 at 9:40 AM Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> > bq.  Isn’t solr.xml is a way to hardcode config in a more flexible way
>> that a Java class?
>> >
>> > Yes, and the problem word here is “flexible”. For a single-node system
>> that flexibility is desirable. Flexibility comes at the cost of complexity,
>> especially in the SolrCloud case. In this case, not so much Solr code
>> complexity as operations complexity.
>> >
>> > For me this isn’t so much a question of functionality as
>> administration/troubleshooting/barrier to entry.
>> >
>> > If:
>> > 1. you can guarantee that every solr.xml file on every node in your
>> entire 500 node cluster is up to date
>> > 2. or you can guarantee that the solr.xml stored on Zookeeper
>> > 3. and you can guarantee that clusterprops.json in cloud mode is
>> interacting properly with whichever solr.xml is read
>> > 4. Then I’d have no problem with solr.xml.
>> >
>> > However, 1, 2, and 3, are not trivial for a large number of Solr nodes
>> and if they aren’t right diagnosing them can be “challenging”…
>> >
>> > Imagine all the ways that “somehow” the solr.xml file on one node or
>> more nodes of a 500 node cluster didn’t get updated and you’re trying to
>> track down why query X isn’t working as you expect. Some of the time. When
>> you happen to hit conditions X, Y and Z on a subrequest that goes to the
>> node in question (which won’t be all of the time, or even possibly a
>> significant fraction of the time). Do Containers matter here? Some glitch
>> in Puppet or similar? Somebody didn’t follow every step in the process in
>> the playbook? It doesn’t matter how you got into this situation, tracking
>> it down would be a nightmare.
>> >
>> > Or, for that matter, you’ve solved all the distribution concerns and
>> _can_ guarantee 1 and 3. Then somebody pushes a solr.xml to ZK either
>> intentionally or by mistake (OH, I thought I was on the QA system, oops).
>> Now I get to spend a week tracking down why the guarantee of 1 is still
>> true, it’s just not relevant any more.
>> >
>> > To me, it’s the same problem that is solved by the blob store for jar
>> files, or having configsets in ZK. When I want something available to all
>> my Solr instances, I do not want to have to run around to every node and
>> determine that the object I copied there is the right one, especially if
>> I’m trying to track down a problem.
>> >
>> > Sure, all my concerns can be solved, but why make it harder than it
>> needs to be? Distributed systems are hard enough already…
>> >
>> > FWIW,
>> > Erick
>> >
>> >
>> >
>> >
>> > > On Sep 3, 2020, at 11:00 AM, Ilan Ginzburg <ilans...@gmail.com>
>> wrote:
>> > >
>> > >  Isn’t solr.xml is a way to hardcode config in a more flexible way
>> that a Java class?
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

Reply via email to