I want to ramp-up/discuss/inventory configuration options in Solr. Here's
my understanding of what exists and what could/should be used depending on
the need. Please correct/complete as needed (or point to documentation I
might have missed).


*There are currently 3 sources of general configuration I'm aware of:*

   - Collection specific config bootstrapped by file *solrconfig.xml* and
   copied into the initial (_default) then subsequent Config Sets in
   Zookeeper.
   - Cluster wide config in Zookeeper */clusterprops.json* editable
   globally through Zookeeper interaction using an API. Not bootstrapped by
   anything (i.e. does not exist until the user explicitly creates it)
   - Node config file *solr.xml* deployed with Solr on each node and loaded
   when Solr starts. Changes to this file are per node and require node
   restart to be taken into account.

The Collection specific config (file solrconfig.xml then in Zookeeper
/configs/*<config set name>*/solrconfig.xml) allows Solr devs to set
reasonable defaults (the file is part of the Solr distribution). Content
can be changed by users as they create new Config Sets persisted in
Zookeeper.

Zookeeper's /clusterprops.json can be edited through the collection admin
API CLUSTERPROP. If users do not set anything there, the file doesn't even
exist in Zookeeper therefore `Solr devs cannot use it to set a default
cluster config, there's no clusterprops.json file in the Solr distrib like
there's a solrconfig.xml.

File solr.xml is used by Solr devs to set some reasonable default
configuration (parametrized through property files or system properties).
There's no API to change that file, users would have to edit/redeploy the
file on each node and restart the Solr JVM on that node for the new config
to be taken into account.



*Based on the above, my vision (or mental model) of what to use depending
on the need:*solrconfig.xml is the only per collection config. IMO it does
its job correctly: Solr devs can set defaults, users tailor the content to
what they need for new config sets. It's the only option for per collection
config anyway.

The real hesitation could be between solr.xml and Zookeeper
/clusterprops.json. What should go where?

For user configs (anything the user does to the Solr cluster AFTER it was
deployed and started), /clusterprops.json seems to be the obvious choice
and offers the right abstractions (global config, no need to worry about
individual nodes, all nodes pick up configs and changes to configs
dynamically).

For configs that need to be available without requiring user intervention
or needed before the connection to ZK is established, *there's currently no
other choice than using solr.xml*. Such configuration obviously include
parameters that are needed to connect to ZK (timeouts, credential provider
and hopefully one day an option to either use direct ZK interaction code or
Curator code), but also configuration of general features that should be
the default without requiring users to opt in yet allowing then to easily
opt out by editing solr.xml before deploying to their cluster (in the
future, this could include which Lucene version to load in Solr for
example).

To summarize:

   - Collection specific config? --> solrconfig.xml
   - User provided cluster config once SolrCloud is running? --> ZK
   /clusterprops.json
   - Solr dev provided cluster config? --> solr.xml


Going forward, some (but only some!) of the config that currently can only
live in solr.xml could be made to go to /clusterprops.json or another ZK
based config file. This would require adding code to create that ZK file
upon initial cluster start (to not force the user to push it) and devise a
mechanism (likely a script, could be tricky though) to update that file in
ZK when a new release of Solr is deployed and a previous version of that
file already exists. Not impossible tasks, but not trivial ones either.
Whatever the needs of such an approach are, it might be easier to keep the
existing solr.xml as a file and allow users to define overrides in
Zookeeper for the configuration parameters from solr.xml that make sense to
be overridden in ZK (obviously ZK credentials or connection timeout do not
make sense in that context, but defining the shard handler implementation
class does since it is likely loaded after a node managed to connect to ZK).

Some config will have to stay in a local Node file system file and only
there no matter what: Zookeeper timeout definition or any node
configuration that is needed before the node connects to Zookeeper.

Reply via email to