Re: [DISCUSS] ConfigSet ZK to file system fallback

Eric Pugh Fri, 22 Jan 2021 05:19:30 -0800

There is a lot in here ;-).  

With the caveat that I don’t have recent experience that many of you do with 
massive solr clusters, I think that we need to commit to fewer, not more, ways 
of maintaining the supporting resources that these clusters need..   I’d like 
to see ways of managing our Solr clusters that encourage easy change and 
experimentation, and encourage us to separate the physical layer (version of 
Solr, networking setup, packages used) from the logical layer (individual 
collections and their supporting code and resources).

I think the configSet was a huge jump forward..   My workflow is to think 
1) What’s unusual about this Solr setup?  What is the physical layer need to 
be?  Special package?  Special code?   Build a Docker image.  
2) Fire up a three node Solr cluster, wait till it’s up and responsive via 
checking APIs.   
3) Now think about my specific use case.   What collections do I need?  Is it 
just 1, or is it 5 or 10 collections.  Are they on the same configSet or 
different.   Great, zip up the configSet and pop it into Solr via APIs.   
4) Create the collections in the shapes I need with the APIs, and now start 
iterating on what I need to do.  Use the APIs to create fields, or set up 
different ParamSets.

However, with configSets we only did half the job, because we still don’t have 
a single well understood way of handling Jars and other resources.  We have 
many ways of doing it.   Which generates constant user confusion and 
contributes to the perspective that “Solr is hard to use”.    

Right now, across the Solr landscape I can think of many ways of adding 
“external” files to my Solr:

1) Classic ./lib as a place to put things.
2) The new to me solr.allow.unsafe.resourceloading=true approach
3) The userfiles directory in Solr accessed by streaming expressions load 
function.
4) The “package store” for packages located in file store
5) The blob store .system concept from before the package store
6) the LTR feature store (which I guess is backed by ZK but could be on the 
disk as well through more hoops...
7) Layering stuff in directly via Docker build files

These are each a little different, with varying levels of support.  

Let’s figure out how we can include a resource that is 10 KB, 1 MB or 1 GB and 
not have to think about ZooKeeper or any of the other implementation details of 
backing that.    Let’s figure out where the package manager is letting us down 
and keep working on it.

> On Jan 22, 2021, at 12:16 AM, David Smiley <dsmi...@apache.org> wrote:
> 
> Summary:  I've been contemplating a simple enhancement to how SolrCloud 
> resolves files in a configSet:  when a file isn't in ZooKeeper, fallback 
> resolution to the same-named configset on the file system (which normally is 
> ignored in SolrCloud today).  A further fallback to _default on the 
> filesystem could be useful as well. The mutable space is always ZK if you 
> edit a schema or configOverlay.json or whatever.
> 
> My primary motivation is allowing for upgrades to plugins, configs, or Solr 
> itself to be easier in some scenarios (certainly not all!).  Imagine that 
> you've got configOverlay.json (with some handlers defined) & params.json & 
> schema.xml in ZK, and solrconfig.xml on the file system, plus some partial 
> xml file of schema field types that is "xi:include"-ed by schema.xml.  Assume 
> that a custom Solr Docker image is used including custom plugins, and with 
> this configSet baked in.  One day you add some new token filters, add a new 
> Lucene merge policy, and remove some outdated update request processor.  You 
> do plugin code changes and xi:included field type changes and edit 
> solrconfig.xml, and build this into your latest company Solr Docker image, 
> and you get it deployed using Kubernetes.  Those changes can be safe to 
> deploy without touching any ZK resident configSet.  Other changes might not 
> be (e.g. removing a field type that is referenced, etc. or doing changes to 
> analyzed text that are too incompatible requiring a re-index) but my point is 
> that some are, and this would be easier.
> 
> An additional motivation is storing large relatively static common resources 
> on the file system.  Where I work, I've got over a gig of them :-). This can 
> be worked around with solr.allow.unsafe.resourceloading=true but... it'd be 
> nice to not have to resort to that.
> 
> Another benefit would be to make it easier to separate one's own 
> configuration with that of the _default configSet you took from Solr when 
> starting a new project.  Resolving differences and then doing Solr upgrades 
> was a common task I had to do as a consultant and my own Solr upgrades.  
> Granted this is possible today but perhaps if this overlay was 
> emphasized/embraced more, it would lead to this outcome.  It's still a 
> problem that a bare-bones solrconfig.xml & schema.xml are either too 
> bare-bones or say too much, and it's a separate issue for Solr to improve 
> that.
> 
> Probably secondary related issue: If the SolrCloud configSet ZK node were to 
> be optional instead of required (thus assume the configSet is entirely on the 
> file system), it would bring other benefits.  It would allow users to use the 
> "file store" or some network mounted storage (NFS) as the configSet location. 
>  It would accelerate experimentation with SolrCloud in docker locally. The 
> biggest PITA anyone notices when first exploring SolrCloud is that configs 
> are fundamentally not on the file system despite you seeing them there; it's 
> all in ZK.  And there's no super convenient way to edit the configuration, 
> not even a web UI.  Using the file system for configSets would be especially 
> nice when doing local SolrCloud experimentation in Docker, eliminating an 
> annoying configSet deployment step.
> 
> I plan to file an issue of course but I think this deserved a dev list 
> discussion.
> 
> I know the new package manager could help with my primary motivating 
> use-case, but I think at present there are too many obstacles there, at least 
> at present.  A file system fallback is a simple thing by comparison.
> 
> Question:  Does the k8s Solr Operator do anything to make configSet & plugin 
> upgrades better?
> 
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley 
> <http://www.linkedin.com/in/davidwsmiley>
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: [DISCUSS] ConfigSet ZK to file system fallback

Reply via email to