I strongly believe that we need to get ZooKeeper out of our clients (that use CloudSolrClient), and use Solr URLs (HTTP) for the cluster state instead. I'm arguing to make this strategic direction clear, and we're already going in the right direction. Realistically, I don't think solrj-zookeeper should be eliminated as it exists for Solr 10 but I could see doing so eventually (no rush!). Starting with Solr 9.8, I'd like users to start using the Solr HTTP alternative option, encouraged by the release notes. In Solr 10 we can remove any documentation in the ref guide on CloudSolrClient working with ZooKeeper. Javadocs in CloudSolrClient.Builder can recommend Solr URLs instead of the ZooKeeper option. I don't have a strong opinion on exactly when to deprecate it. Today is too soon.
Why: - Principled — ZooKeeper is conceptually behind Solr; clients shouldn’t talk to it. - Fewer dependencies for clients (no ZooKeeper or Netty). - Better security — only Solr should talk to ZooKeeper! Security settings and key configuration files are stored in ZooKeeper. - Eliminate impact of ZK storage on clients. The change of where the configSet name was stored in ZK is an example. PRS is another. And other changes I’ve seen in a fork. - Reduce complexity of SolrJ from an operational standpoint and bug risks (e.g. no ZkStateReader there). No Zookeeper related configuration (jute.maxbuffer, etc.) - Reduce complexity of SolrCloud by limiting the range of use of key classes like ZkStateReader to only be in Solr instead of also existing in SolrJ. For example it’s not clear if/when LazyCollectionRef’s are used in SolrJ but with this separation, it’d be clearer that it couldn’t exist in SolrJ. - Increase our options for classes in solrj-zookeeper, like adding more dependencies (traces & metrics) without concern of burdening any user/client. - Reliably working with a collection after collection creation. If you’ve seen waitForActiveCollection after creating a collection in our tests, this is what I mean (and it’s not strictly a test issue). It's sad; make them go away! Progress has been made on the alternative: Ishan & Noble got the ball rolling years ago to introduce the HTTP alternative option. I call it HttpCSP internally based on an abbreviation of its class name. But I don't think anyone actually uses it based on how poorly it performed, as reported in JIRA. In Solr 9.1, SolrJ was modularized, creating the "solrj-zookeeper" module (opt-out), and made opt-in for Solr 10. Finally, key performance improvements landed in Solr 9.8 for the HTTP option making it viable for most users (IMO). Credit to my colleagues Haythem & Aparna on some of these. That said, HttpCSP (and CloudSolrClient actually) hasn't reached its ideal state yet. Some improvement possibilities / problems: - The cached DocCollection (i.e. a collection's state) expires out of a cache with a hard-coded TTL, even if it’s actively being used. I don’t think it should. It’d lead to poor p99 client experienced request metrics for those that have to additionally fetch the DocCollection — avoidably. - There’s a DocCollection version staleness mechanism but IMO it’s not robust. - If all live nodes disappear temporarily (hard cluster restart), I could imagine the client failing permanently. (credit to Ilan) - CloudSolrClient.getClusterState (and its equivalent method on the provider) goes from a trivial getter to a slow remote call fetching the entire cluster’s state; no cache. We have code using it in various places; surely users too. This class has issues (out of scope of this post), so I want to deprecate this so that the client never touches ClusterState. Getting live-nodes, DocCollection, and cluster properties are still accessible though. The last one, basically banning ClusterState in SolrJ, is the biggest performance trap / issue that needs to be prioritized; I plan to create a JIRA or two. I suppose I could make a SIP out of this... albeit maybe the time for that was years ago when HttpCSP came into existence. I'm just trying to see this through to a conclusion. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley