Thanks for raising this, Pierre. My short answer is that for the use cases I'm directly concerned with, transient cores would I think not be useful. Although we could absolutely benefit from transient cores if it were simply a question of _querying_ (relatively light and variable query load on many cores), our indexing load is heavy and pretty consistent, which (unless I'm mistaken) would probably rule out transient cores.
I find myself wondering where the bulk of memory savings come from in the case of transient cores, which I think could help make an informed decision (and help those who may not currently use transient cores determine potential benefits). A couple of things jump to mind as primary memory consumers that would scale with number of loaded cores: 1. caches, if configured with warming (auto or static), could I think under current stock implementation accumulate a fair amount of memory and hold onto it indefinitely. 2. Prior to Solr 9.7 (SOLR-16677), ThreadLocal hacks to maintain state for StoredFields also accumulated a significant amount of memory, particularly in high-core-count, high-thread-count cases. I'd be very curious to gauge the impact of Solr 9.7/SOLR-16677 on the case that's addressed by transient cores. Regarding memory consumption by caches, there are other ways around this, such as node-level caches or TTL on cache entries (carried over by auto-warming). Michael On Thu, Sep 12, 2024 at 7:10 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > > I think the fundamental underlying question is how SolrCloud is run. I’m > under the impression that most deployments of SolrCloud tend to use all > collections/shards all the time, in which case unloading cores is not > overly useful. > > The use case for transient cores is when different collections or different > shards of collections have different usage patterns and might spend > relatively long periods of time without being used in which case unloading > them from memory makes sense. This is the case of a multi tenant hosted > environment such as the one Salesforce runs on top of SolrCloud (with ZERO > replicas). > > Are there other similar use cases for SolrCloud in the industry? > > Ilan > > On Tue 10 Sep 2024 at 18:15, Pierre Salagnac <pierre.salag...@gmail.com> > wrote: > > > Starting a thread to discuss transient core support in SolrCloud, and > > mostly to figure out if Solr users would be interested in it. > > > > Transient cores allow a Solr node to not keep all cores in memory. > > Basically, a given core may be dynamically loaded (if not already loaded) > > to answer a request, and then unloaded later to free up some memory for > > another core. This is a memory saver at the cost of a higher CPU > > consumption. Depending on the cluster usage pattern, it may be very useful > > or counter productive. A cluster with many cores/collections that are not > > updated or queried concurrently will perform much better with transient > > cores and appropriate tuning (let say we don't handle same data during the > > day from during the night) > > > > That's a quite old feature, but as far as I know, it never worked with > > SolrCloud (worked only in standalone mode). > > This feature has been deprecated with SOLR-16591. My understanding is this > > was mostly because of the lack of support in cloud mode, as most users now > > run SolrCloud. > > > > > > I've recently worked in our internal fork to make transient cores work with > > SolrCloud with internal implementation of ZERO/SIP-20 replicas.[1] This for > > sure takes some shortcuts since ZERO replicas don't support all Solr > > features. Even if this does not run in production at scale yet, I reached a > > point where I'm confident that [transient cores] + [ZERO replicas] can > > work. > > Transposing this work to NRT/TLOG/PULL replicas, I see only one pain point: > > recovery is supposed to happen when we open the core. By skipping the core > > opening at start-up, we also skip recovering/replicating cores from peers. > > And by re-opening a core later, not sure how to make sure the replication > > does not interact wrongly. > > > > Beside this last point that I don't know how to address right now, I don't > > see any blocker in extending the logic from standalone to SolrCloud. > > > > > > Now, I want to ask whether other people are interested in transient cores. > > If yes, I can start by contributing the changes that make sense without > > SIP-20, with the long term goal to un-deprecate this feature eventually. > > If not, I'll just let the feature die. > > > > Thanks > > > > > > [1] > > > > https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org