Re: Solr configuration options

Gus Heck Fri, 28 Aug 2020 11:11:38 -0700

Which means whoever wants to make changes to solr needs to be
able/willing/competent to make AMI/dockers/etc ... and one has to manage
versions of those variants as opposed to managing versions of config files.


On Fri, Aug 28, 2020 at 1:55 PM Tomás Fernández Löbbe <tomasflo...@gmail.com>
wrote:

> I think if you are using AMIs (or Docker), you could put the node
> configuration inside the AMI (or Docker image), as Ilan said, together with
> the binaries. Say you have a custom top-level handler (Collections, Cores,
> Info, whatever), which takes some arguments and it's configured in solr.xml
> and you are doing an upgrade, you probably want your old nodes (running
> with your old AMI/Docker image with old jars) to keep the old configuration
> and your new nodes to use the new.
>
> On Fri, Aug 28, 2020 at 10:42 AM Gus Heck <gus.h...@gmail.com> wrote:
>
>> Putting solr.xml in zookeeper means you can add a node simply by starting
>> solr pointing to the zookeeper, and ensure a consistent solr.xml for the
>> new node if you've customized it. Since I rarely (never) hit use cases
>> where I need different per node solr.xml. I generally advocate putting it
>> in ZK, I'd say heterogeneous node configs is the special case for advanced
>> use here.  I'm a fan of a (hypothetical future) world where nodes can be
>> added/removed simply without need for local configuration. It would be
>> desirable IMHO to have a smooth node add and remove process and having to
>> install a file into a distribution manually after unpacking it (or having
>> coordinate variations of config to be pushed to machines) is a minus. If
>> and when autoscaling is happy again I'd like to be able to start an AMI in
>> AWS pointing at zk (or similar) and have it join automatically, and then
>> receive replicas to absorb load (per whatever autoscaling is specified),
>> and then be able to issue a single command to a node to sunset the node
>> that moves replicas back off of it (again per autoscaling preferences,
>> failing if autoscaling constraints would be violated) and then asks the
>> node to shut down so that the instance in AWS (or wherever) can be shut
>> down safely.  This is a black friday,  new tenants/lost tenants, or new
>> feature/EOL feature sort of use case.
>>
>> Thus IMHO all config for cloud should live somewhere in ZK. File system
>> access should not be required to add/remove capacity. If multiple node
>> configurations need to be supported we should have nodeTypes directory in
>> zk (similar to configsets for collections), possible node specific configs
>> there and an env var that can be read to determine the type (with some
>> cluster level designation of a default node type). I think that would be
>> sufficient to parameterize AMI stuff (or containers) by reading tags into
>> env variables
>>
>> As for knowing what a node loaded, we really should be able to emit any
>> config file we've loaded (without reference to disk or zk). They aren't
>> that big and in most cases don't change that fast, so caching a simple copy
>> as a string in memory (but only if THAT node loaded it) for verification
>> would seem smart. Having a file on disk doesn't tell you if solr loaded
>> with that version or if it's changed since solr loaded it either.
>>
>> Anyway, that's the pie in my sky...
>>
>> -Gus
>>
>> On Fri, Aug 28, 2020 at 11:51 AM Ilan Ginzburg <ilans...@gmail.com>
>> wrote:
>>
>>> What I'm really looking for (and currently my understanding is that
>>> solr.xml is the only option) is *a cluster config a Solr dev can set as
>>> a default* when introducing a new feature for example, so that the
>>> config is picked out of the box in SolrCloud, yet allowing the end user to
>>> override it if he so wishes.
>>>
>>> But "cluster config" in this context *with a caveat*: when doing a
>>> rolling upgrade, nodes running new code need the new cluster config, nodes
>>> running old code need the previous cluster config... Having a per node
>>> solr.xml deployed atomically with the code as currently the case has
>>> disadvantages, but solves this problem effectively in a very simple way. If
>>> we were to move to a central cluster config, we'd likely need to introduce
>>> config versioning or as Noble suggested elsewhere, only write code that's
>>> backward compatible (w.r.t. config), deploy that code everywhere then once
>>> no old code is running, update the cluster config. I find this approach
>>> complicated from both dev and operational perspective with an unclear added
>>> value.
>>>
>>> Ilan
>>>
>>> PS. I've stumbled upon the loading of solr.xml from Zookeeper in the
>>> past but couldn't find it as I wrote my message so I thought I imagined
>>> it...
>>>
>>> It's in SolrDispatchFilter.loadNodeConfig(). It establishes a
>>> connection to ZK for fetching solr.xml then closes it.
>>> It relies on system property waitForZk as the connection timeout (in
>>> seconds, defaults to 30) and system property zkHost as the Zookeeper
>>> host.
>>>
>>> I believe solr.xml can only end up in ZK through the use of ZkCLI. Then
>>> the user is on his own to manage SolrCloud version upgrades: if a new
>>> solr.xml is included as part of a new version of SolrCloud, the user
>>> having pushed a previous version into ZK will not see the update.
>>> I wonder if putting solr.xml in ZK is a common practice.
>>>
>>> On Fri, Aug 28, 2020 at 4:58 PM Jan Høydahl <jan....@cominvent.com>
>>> wrote:
>>>
>>>> I interpret solr.xml as the node-local configuration for a single node.
>>>> clusterprops.json is the cluster-wide configuration applying to all
>>>> nodes.
>>>> solrconfig.xml is of course per core etc
>>>>
>>>> solr.in.sh is the per-node ENV-VAR way of configuring a node, and many
>>>> of those are picked up in solr.xml (other in bin/solr).
>>>>
>>>> I think it is important to keep a file-local config file which can only
>>>> be modified if you have shell access to that local node, it provides an
>>>> extra layer of security.
>>>> And in certain cases a node may need a different configuration from
>>>> another node, i.e. during an upgrade.
>>>>
>>>> I put solr.xml in zookeeper. It may have been a mistake, since it may
>>>> not make all that much sense to load solr.xml which is a node-level file,
>>>> from ZK. But if it uses var substitutions for all node-level stuff, it will
>>>> still work since those vars are pulled from local properties when parsed
>>>> anyway.
>>>>
>>>> I’m also somewhat against hijacking clusterprops.json as a general
>>>> purpose JSON config file for the cluster. It was supposed to be for simple
>>>> properties.
>>>>
>>>> Jan
>>>>
>>>> > 28. aug. 2020 kl. 14:23 skrev Erick Erickson <erickerick...@gmail.com
>>>> >:
>>>> >
>>>> > Solr.xml can also exist on Zookeeper, it doesn’t _have_ to exist
>>>> locally. You do have to restart to have any changes take effect.
>>>> >
>>>> > Long ago in a Solr far away solr.xml was where all the cores were
>>>> defined. This was before “core discovery” was put in. Since solr.xml had to
>>>> be there anyway and was read at startup, other global information was added
>>>> and it’s lived on...
>>>> >
>>>> > Then clusterprops.json came along as a place to put, well,
>>>> cluster-wide properties so having solr.xml too seems awkward. Although if
>>>> you do have solr.xml locally to each node, you could theoretically have
>>>> different settings for different Solr instances. Frankly I consider this
>>>> more of a bug than a feature.
>>>> >
>>>> > I know there have been some talk about removing solr.xml entirely,
>>>> but I’m not sure what the thinking is about what to do instead. Whatever we
>>>> do needs to accommodate standalone. We could do the same trick we do now,
>>>> and essentially move all the current options in solr.xml to
>>>> clusterprops.json (or other ZK node) and read it locally for stand-alone.
>>>> The API could even be used to change it if it was stored locally.
>>>> >
>>>> > That still leaves the chicken-and-egg problem if connecting to ZK in
>>>> the first place.
>>>> >
>>>> >> On Aug 28, 2020, at 7:43 AM, Ilan Ginzburg <ilans...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> I want to ramp-up/discuss/inventory configuration options in Solr.
>>>> Here's my understanding of what exists and what could/should be used
>>>> depending on the need. Please correct/complete as needed (or point to
>>>> documentation I might have missed).
>>>> >>
>>>> >> There are currently 3 sources of general configuration I'm aware of:
>>>> >>      • Collection specific config bootstrapped by file
>>>> solrconfig.xml and copied into the initial (_default) then subsequent
>>>> Config Sets in Zookeeper.
>>>> >>      • Cluster wide config in Zookeeper /clusterprops.json editable
>>>> globally through Zookeeper interaction using an API. Not bootstrapped by
>>>> anything (i.e. does not exist until the user explicitly creates it)
>>>> >>      • Node config file solr.xml deployed with Solr on each node and
>>>> loaded when Solr starts. Changes to this file are per node and require node
>>>> restart to be taken into account.
>>>> >> The Collection specific config (file solrconfig.xml then in
>>>> Zookeeper /configs/<config set name>/solrconfig.xml) allows Solr devs to
>>>> set reasonable defaults (the file is part of the Solr distribution).
>>>> Content can be changed by users as they create new Config Sets persisted in
>>>> Zookeeper.
>>>> >>
>>>> >> Zookeeper's /clusterprops.json can be edited through the collection
>>>> admin API CLUSTERPROP. If users do not set anything there, the file doesn't
>>>> even exist in Zookeeper therefore `Solr devs cannot use it to set a default
>>>> cluster config, there's no clusterprops.json file in the Solr distrib like
>>>> there's a solrconfig.xml.
>>>> >>
>>>> >> File solr.xml is used by Solr devs to set some reasonable default
>>>> configuration (parametrized through property files or system properties).
>>>> There's no API to change that file, users would have to edit/redeploy the
>>>> file on each node and restart the Solr JVM on that node for the new config
>>>> to be taken into account.
>>>> >>
>>>> >> Based on the above, my vision (or mental model) of what to use
>>>> depending on the need:
>>>> >>
>>>> >> solrconfig.xml is the only per collection config. IMO it does its
>>>> job correctly: Solr devs can set defaults, users tailor the content to what
>>>> they need for new config sets. It's the only option for per collection
>>>> config anyway.
>>>> >>
>>>> >> The real hesitation could be between solr.xml and Zookeeper
>>>> /clusterprops.json. What should go where?
>>>> >>
>>>> >> For user configs (anything the user does to the Solr cluster AFTER
>>>> it was deployed and started), /clusterprops.json seems to be the obvious
>>>> choice and offers the right abstractions (global config, no need to worry
>>>> about individual nodes, all nodes pick up configs and changes to configs
>>>> dynamically).
>>>> >>
>>>> >> For configs that need to be available without requiring user
>>>> intervention or needed before the connection to ZK is established, there's
>>>> currently no other choice than using solr.xml. Such configuration obviously
>>>> include parameters that are needed to connect to ZK (timeouts, credential
>>>> provider and hopefully one day an option to either use direct ZK
>>>> interaction code or Curator code), but also configuration of general
>>>> features that should be the default without requiring users to opt in yet
>>>> allowing then to easily opt out by editing solr.xml before deploying to
>>>> their cluster (in the future, this could include which Lucene version to
>>>> load in Solr for example).
>>>> >>
>>>> >> To summarize:
>>>> >>      • Collection specific config? --> solrconfig.xml
>>>> >>      • User provided cluster config once SolrCloud is running? -->
>>>> ZK /clusterprops.json
>>>> >>      • Solr dev provided cluster config? --> solr.xml
>>>> >>
>>>> >> Going forward, some (but only some!) of the config that currently
>>>> can only live in solr.xml could be made to go to /clusterprops.json or
>>>> another ZK based config file. This would require adding code to create that
>>>> ZK file upon initial cluster start (to not force the user to push it) and
>>>> devise a mechanism (likely a script, could be tricky though) to update that
>>>> file in ZK when a new release of Solr is deployed and a previous version of
>>>> that file already exists. Not impossible tasks, but not trivial ones
>>>> either. Whatever the needs of such an approach are, it might be easier to
>>>> keep the existing solr.xml as a file and allow users to define overrides in
>>>> Zookeeper for the configuration parameters from solr.xml that make sense to
>>>> be overridden in ZK (obviously ZK credentials or connection timeout do not
>>>> make sense in that context, but defining the shard handler implementation
>>>> class does since it is likely loaded after a node managed to connect to 
>>>> ZK).
>>>> >>
>>>> >> Some config will have to stay in a local Node file system file and
>>>> only there no matter what: Zookeeper timeout definition or any node
>>>> configuration that is needed before the node connects to Zookeeper.
>>>> >>
>>>> >
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> >
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>
>>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr configuration options

Reply via email to