Would per-replica state (PRS) help with that? That slices by replica, not collection, but it should allow finer-grained locking.
https://searchscale.com/blog/prs/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 16, 2024, at 9:03 AM, David Smiley <dsmi...@apache.org> wrote: > > At work, in a scenario when a node starts with thousands of cores for > thousands of collections, we've seen that core registration can > bottleneck on ZkStateReader.forceUpdateCollection(collection) which > synchronizes on getUpdateLock, a global lock (not per-collection). I > don't know the history or strategy behind that lock, but it's a > code-smell to see a global lock that is used in a circumstance that is > scoped to one collection. I suspect it's there because ClusterState > is immutable and encompasses basically all state. If it was instead a > cache that can be snapshotted (for consumers that require an immutable > state to act on), we could probably make getUpdateLock go away. *If* > a collection's state needs to be locked (and I'm suspicious that it > is, so long as cache insertion is done properly / exclusively), we > could have a lock just for the collection. > > Any concerns with this idea? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org >