Which means whoever wants to make changes to solr needs to be able/willing/competent to make AMI/dockers/etc ... and one has to manage versions of those variants as opposed to managing versions of config files.
On Fri, Aug 28, 2020 at 1:55 PM Tomás Fernández Löbbe <tomasflo...@gmail.com> wrote: > I think if you are using AMIs (or Docker), you could put the node > configuration inside the AMI (or Docker image), as Ilan said, together with > the binaries. Say you have a custom top-level handler (Collections, Cores, > Info, whatever), which takes some arguments and it's configured in solr.xml > and you are doing an upgrade, you probably want your old nodes (running > with your old AMI/Docker image with old jars) to keep the old configuration > and your new nodes to use the new. > > On Fri, Aug 28, 2020 at 10:42 AM Gus Heck <gus.h...@gmail.com> wrote: > >> Putting solr.xml in zookeeper means you can add a node simply by starting >> solr pointing to the zookeeper, and ensure a consistent solr.xml for the >> new node if you've customized it. Since I rarely (never) hit use cases >> where I need different per node solr.xml. I generally advocate putting it >> in ZK, I'd say heterogeneous node configs is the special case for advanced >> use here. I'm a fan of a (hypothetical future) world where nodes can be >> added/removed simply without need for local configuration. It would be >> desirable IMHO to have a smooth node add and remove process and having to >> install a file into a distribution manually after unpacking it (or having >> coordinate variations of config to be pushed to machines) is a minus. If >> and when autoscaling is happy again I'd like to be able to start an AMI in >> AWS pointing at zk (or similar) and have it join automatically, and then >> receive replicas to absorb load (per whatever autoscaling is specified), >> and then be able to issue a single command to a node to sunset the node >> that moves replicas back off of it (again per autoscaling preferences, >> failing if autoscaling constraints would be violated) and then asks the >> node to shut down so that the instance in AWS (or wherever) can be shut >> down safely. This is a black friday, new tenants/lost tenants, or new >> feature/EOL feature sort of use case. >> >> Thus IMHO all config for cloud should live somewhere in ZK. File system >> access should not be required to add/remove capacity. If multiple node >> configurations need to be supported we should have nodeTypes directory in >> zk (similar to configsets for collections), possible node specific configs >> there and an env var that can be read to determine the type (with some >> cluster level designation of a default node type). I think that would be >> sufficient to parameterize AMI stuff (or containers) by reading tags into >> env variables >> >> As for knowing what a node loaded, we really should be able to emit any >> config file we've loaded (without reference to disk or zk). They aren't >> that big and in most cases don't change that fast, so caching a simple copy >> as a string in memory (but only if THAT node loaded it) for verification >> would seem smart. Having a file on disk doesn't tell you if solr loaded >> with that version or if it's changed since solr loaded it either. >> >> Anyway, that's the pie in my sky... >> >> -Gus >> >> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg <ilans...@gmail.com> >> wrote: >> >>> What I'm really looking for (and currently my understanding is that >>> solr.xml is the only option) is *a cluster config a Solr dev can set as >>> a default* when introducing a new feature for example, so that the >>> config is picked out of the box in SolrCloud, yet allowing the end user to >>> override it if he so wishes. >>> >>> But "cluster config" in this context *with a caveat*: when doing a >>> rolling upgrade, nodes running new code need the new cluster config, nodes >>> running old code need the previous cluster config... Having a per node >>> solr.xml deployed atomically with the code as currently the case has >>> disadvantages, but solves this problem effectively in a very simple way. If >>> we were to move to a central cluster config, we'd likely need to introduce >>> config versioning or as Noble suggested elsewhere, only write code that's >>> backward compatible (w.r.t. config), deploy that code everywhere then once >>> no old code is running, update the cluster config. I find this approach >>> complicated from both dev and operational perspective with an unclear added >>> value. >>> >>> Ilan >>> >>> PS. I've stumbled upon the loading of solr.xml from Zookeeper in the >>> past but couldn't find it as I wrote my message so I thought I imagined >>> it... >>> >>> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a >>> connection to ZK for fetching solr.xml then closes it. >>> It relies on system property waitForZk as the connection timeout (in >>> seconds, defaults to 30) and system property zkHost as the Zookeeper >>> host. >>> >>> I believe solr.xml can only end up in ZK through the use of ZkCLI. Then >>> the user is on his own to manage SolrCloud version upgrades: if a new >>> solr.xml is included as part of a new version of SolrCloud, the user >>> having pushed a previous version into ZK will not see the update. >>> I wonder if putting solr.xml in ZK is a common practice. >>> >>> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <jan....@cominvent.com> >>> wrote: >>> >>>> I interpret solr.xml as the node-local configuration for a single node. >>>> clusterprops.json is the cluster-wide configuration applying to all >>>> nodes. >>>> solrconfig.xml is of course per core etc >>>> >>>> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many >>>> of those are picked up in solr.xml (other in bin/solr). >>>> >>>> I think it is important to keep a file-local config file which can only >>>> be modified if you have shell access to that local node, it provides an >>>> extra layer of security. >>>> And in certain cases a node may need a different configuration from >>>> another node, i.e. during an upgrade. >>>> >>>> I put solr.xml in zookeeper. It may have been a mistake, since it may >>>> not make all that much sense to load solr.xml which is a node-level file, >>>> from ZK. But if it uses var substitutions for all node-level stuff, it will >>>> still work since those vars are pulled from local properties when parsed >>>> anyway. >>>> >>>> I’m also somewhat against hijacking clusterprops.json as a general >>>> purpose JSON config file for the cluster. It was supposed to be for simple >>>> properties. >>>> >>>> Jan >>>> >>>> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson <erickerick...@gmail.com >>>> >: >>>> > >>>> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist >>>> locally. You do have to restart to have any changes take effect. >>>> > >>>> > Long ago in a Solr far away solr.xml was where all the cores were >>>> defined. This was before “core discovery” was put in. Since solr.xml had to >>>> be there anyway and was read at startup, other global information was added >>>> and it’s lived on... >>>> > >>>> > Then clusterprops.json came along as a place to put, well, >>>> cluster-wide properties so having solr.xml too seems awkward. Although if >>>> you do have solr.xml locally to each node, you could theoretically have >>>> different settings for different Solr instances. Frankly I consider this >>>> more of a bug than a feature. >>>> > >>>> > I know there have been some talk about removing solr.xml entirely, >>>> but I’m not sure what the thinking is about what to do instead. Whatever we >>>> do needs to accommodate standalone. We could do the same trick we do now, >>>> and essentially move all the current options in solr.xml to >>>> clusterprops.json (or other ZK node) and read it locally for stand-alone. >>>> The API could even be used to change it if it was stored locally. >>>> > >>>> > That still leaves the chicken-and-egg problem if connecting to ZK in >>>> the first place. >>>> > >>>> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <ilans...@gmail.com> >>>> wrote: >>>> >> >>>> >> I want to ramp-up/discuss/inventory configuration options in Solr. >>>> Here's my understanding of what exists and what could/should be used >>>> depending on the need. Please correct/complete as needed (or point to >>>> documentation I might have missed). >>>> >> >>>> >> There are currently 3 sources of general configuration I'm aware of: >>>> >> • Collection specific config bootstrapped by file >>>> solrconfig.xml and copied into the initial (_default) then subsequent >>>> Config Sets in Zookeeper. >>>> >> • Cluster wide config in Zookeeper /clusterprops.json editable >>>> globally through Zookeeper interaction using an API. Not bootstrapped by >>>> anything (i.e. does not exist until the user explicitly creates it) >>>> >> • Node config file solr.xml deployed with Solr on each node and >>>> loaded when Solr starts. Changes to this file are per node and require node >>>> restart to be taken into account. >>>> >> The Collection specific config (file solrconfig.xml then in >>>> Zookeeper /configs/<config set name>/solrconfig.xml) allows Solr devs to >>>> set reasonable defaults (the file is part of the Solr distribution). >>>> Content can be changed by users as they create new Config Sets persisted in >>>> Zookeeper. >>>> >> >>>> >> Zookeeper's /clusterprops.json can be edited through the collection >>>> admin API CLUSTERPROP. If users do not set anything there, the file doesn't >>>> even exist in Zookeeper therefore `Solr devs cannot use it to set a default >>>> cluster config, there's no clusterprops.json file in the Solr distrib like >>>> there's a solrconfig.xml. >>>> >> >>>> >> File solr.xml is used by Solr devs to set some reasonable default >>>> configuration (parametrized through property files or system properties). >>>> There's no API to change that file, users would have to edit/redeploy the >>>> file on each node and restart the Solr JVM on that node for the new config >>>> to be taken into account. >>>> >> >>>> >> Based on the above, my vision (or mental model) of what to use >>>> depending on the need: >>>> >> >>>> >> solrconfig.xml is the only per collection config. IMO it does its >>>> job correctly: Solr devs can set defaults, users tailor the content to what >>>> they need for new config sets. It's the only option for per collection >>>> config anyway. >>>> >> >>>> >> The real hesitation could be between solr.xml and Zookeeper >>>> /clusterprops.json. What should go where? >>>> >> >>>> >> For user configs (anything the user does to the Solr cluster AFTER >>>> it was deployed and started), /clusterprops.json seems to be the obvious >>>> choice and offers the right abstractions (global config, no need to worry >>>> about individual nodes, all nodes pick up configs and changes to configs >>>> dynamically). >>>> >> >>>> >> For configs that need to be available without requiring user >>>> intervention or needed before the connection to ZK is established, there's >>>> currently no other choice than using solr.xml. Such configuration obviously >>>> include parameters that are needed to connect to ZK (timeouts, credential >>>> provider and hopefully one day an option to either use direct ZK >>>> interaction code or Curator code), but also configuration of general >>>> features that should be the default without requiring users to opt in yet >>>> allowing then to easily opt out by editing solr.xml before deploying to >>>> their cluster (in the future, this could include which Lucene version to >>>> load in Solr for example). >>>> >> >>>> >> To summarize: >>>> >> • Collection specific config? --> solrconfig.xml >>>> >> • User provided cluster config once SolrCloud is running? --> >>>> ZK /clusterprops.json >>>> >> • Solr dev provided cluster config? --> solr.xml >>>> >> >>>> >> Going forward, some (but only some!) of the config that currently >>>> can only live in solr.xml could be made to go to /clusterprops.json or >>>> another ZK based config file. This would require adding code to create that >>>> ZK file upon initial cluster start (to not force the user to push it) and >>>> devise a mechanism (likely a script, could be tricky though) to update that >>>> file in ZK when a new release of Solr is deployed and a previous version of >>>> that file already exists. Not impossible tasks, but not trivial ones >>>> either. Whatever the needs of such an approach are, it might be easier to >>>> keep the existing solr.xml as a file and allow users to define overrides in >>>> Zookeeper for the configuration parameters from solr.xml that make sense to >>>> be overridden in ZK (obviously ZK credentials or connection timeout do not >>>> make sense in that context, but defining the shard handler implementation >>>> class does since it is likely loaded after a node managed to connect to >>>> ZK). >>>> >> >>>> >> Some config will have to stay in a local Node file system file and >>>> only there no matter what: Zookeeper timeout definition or any node >>>> configuration that is needed before the node connects to Zookeeper. >>>> >> >>>> > >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> > For additional commands, e-mail: dev-h...@lucene.apache.org >>>> > >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)