I've been looking at HttpClusterStateProvider lately, and of ClusterState.
It has a method getClusterState which goes and loads the complete
ClusterState (all collections with all state info).  ClusterState is
immutable.  At a massive collection scale, such a method is very
disconcerting!  Thankfully, there's a method getState(collection)
returning a CollectionRef (holder of DocCollection) implemented by
fetching only the state of the pertinent collection.  Likewise the
live nodes can be retrieved directly from ClusterStateProvider without
requiring using ClusterState.

I'd like to make a bold proposal: Merge ClusterState with
ClusterStateProvider, keeping the same ClusterState name & package and
all/most API methods.  This means it would lose its immutability
designation.  If an immutable variation is needed, one could exist.

Don't include methods like getCollectionsMap which is evil at
many-collection scale.  Listing/looping collections should be done
sparingly; don't make it too easy to do by accident.

Possibly also move CloudSolrClient's StateCache (a cache of
DocCollection keyed by collection name) into the new & improved
ClusterState.

The end-game is ClusterState being where we can list live nodes,
aliases, collections, and most importantly a cache of DocCollection.
With an eventually consistent mind-set; anything can be out of date
and may need to be re-fetched.

Has anyone thought similarly or have concerns in such a pursuit?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to