I've been looking at HttpClusterStateProvider lately, and of ClusterState. It has a method getClusterState which goes and loads the complete ClusterState (all collections with all state info). ClusterState is immutable. At a massive collection scale, such a method is very disconcerting! Thankfully, there's a method getState(collection) returning a CollectionRef (holder of DocCollection) implemented by fetching only the state of the pertinent collection. Likewise the live nodes can be retrieved directly from ClusterStateProvider without requiring using ClusterState.
I'd like to make a bold proposal: Merge ClusterState with ClusterStateProvider, keeping the same ClusterState name & package and all/most API methods. This means it would lose its immutability designation. If an immutable variation is needed, one could exist. Don't include methods like getCollectionsMap which is evil at many-collection scale. Listing/looping collections should be done sparingly; don't make it too easy to do by accident. Possibly also move CloudSolrClient's StateCache (a cache of DocCollection keyed by collection name) into the new & improved ClusterState. The end-game is ClusterState being where we can list live nodes, aliases, collections, and most importantly a cache of DocCollection. With an eventually consistent mind-set; anything can be out of date and may need to be re-fetched. Has anyone thought similarly or have concerns in such a pursuit? ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org