Thanks for raising this, Pierre. My short answer is that for the use
cases I'm directly concerned with, transient cores would I think not
be useful. Although we could absolutely benefit from transient cores
if it were simply a question of _querying_ (relatively light and
variable query load on many cores), our indexing load is heavy and
pretty consistent, which (unless I'm mistaken) would probably rule out
transient cores.

I find myself wondering where the bulk of memory savings come from in
the case of transient cores, which I think could help make an informed
decision (and help those who may not currently use transient cores
determine potential benefits).

A couple of things jump to mind as primary memory consumers that would
scale with number of loaded cores:
1. caches, if configured with warming (auto or static), could I think
under current stock implementation accumulate a fair amount of memory
and hold onto it indefinitely.
2. Prior to Solr 9.7 (SOLR-16677), ThreadLocal hacks to maintain state
for StoredFields also accumulated a significant amount of memory,
particularly in high-core-count, high-thread-count cases.

I'd be very curious to gauge the impact of Solr 9.7/SOLR-16677 on the
case that's addressed by transient cores. Regarding memory consumption
by caches, there are other ways around this, such as node-level caches
or TTL on cache entries (carried over by auto-warming).

Michael

On Thu, Sep 12, 2024 at 7:10 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
> I think the fundamental underlying question is how SolrCloud is run. I’m
> under the impression that most deployments of SolrCloud tend to use all
> collections/shards all the time, in which case unloading cores is not
> overly useful.
>
> The use case for transient cores is when different collections or different
> shards of collections have different usage patterns and might spend
> relatively long periods of time without being used in which case unloading
> them from memory makes sense. This is the case of a multi tenant hosted
> environment such as the one Salesforce runs on top of SolrCloud (with ZERO
> replicas).
>
> Are there other similar use cases for SolrCloud in the industry?
>
> Ilan
>
> On Tue 10 Sep 2024 at 18:15, Pierre Salagnac <pierre.salag...@gmail.com>
> wrote:
>
> > Starting a thread to discuss transient core support in SolrCloud, and
> > mostly to figure out if Solr users would be interested in it.
> >
> > Transient cores allow a Solr node to not keep all cores in memory.
> > Basically, a given core may be dynamically loaded (if not already loaded)
> > to answer a request, and then unloaded later to free up some memory for
> > another core. This is a memory saver at the cost of a higher CPU
> > consumption. Depending on the cluster usage pattern, it may be very useful
> > or counter productive. A cluster with many cores/collections that are not
> > updated or queried concurrently will perform much better with transient
> > cores and appropriate tuning (let say we don't handle same data during the
> > day from during the night)
> >
> > That's a quite old feature, but as far as I know, it never worked with
> > SolrCloud (worked only in standalone mode).
> > This feature has been deprecated with SOLR-16591. My understanding is this
> > was mostly because of the lack of support in cloud mode, as most users now
> > run SolrCloud.
> >
> >
> > I've recently worked in our internal fork to make transient cores work with
> > SolrCloud with internal implementation of ZERO/SIP-20 replicas.[1] This for
> > sure takes some shortcuts since ZERO replicas don't support all Solr
> > features. Even if this does not run in production at scale yet, I reached a
> > point where I'm confident that [transient cores] + [ZERO replicas] can
> > work.
> > Transposing this work to NRT/TLOG/PULL replicas, I see only one pain point:
> > recovery is supposed to happen when we open the core. By skipping the core
> > opening at start-up, we also skip recovering/replicating cores from peers.
> > And by re-opening a core later, not sure how to make sure the replication
> > does not interact wrongly.
> >
> > Beside this last point that I don't know how to address right now, I don't
> > see any blocker in extending the logic from standalone to SolrCloud.
> >
> >
> > Now, I want to ask whether other people are interested in transient cores.
> > If yes, I can start by contributing the changes that make sense without
> > SIP-20, with the long term goal to un-deprecate this feature eventually.
> > If not, I'll just let the feature die.
> >
> > Thanks
> >
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/SOLR/SIP-20%3A+Separation+of+Compute+and+Storage+in+SolrCloud
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to