This is an awesome feature for solr cloud! Currently for our read
heavy/write heavy use case, we exclude all query requests from the leader
in each shard to avoid becoming the load bottleneck. Also each solr cloud
has its own pipeline for NRT updates.  With stateless replica and
persistent storage support, can one small cluster be dedicated to handle
the updates, while many serving clouds all pull the updated segments from
central persistent storage? It would be significant resoure savings.

Thanks,
Wei

On Sat, Apr 29, 2023 at 2:30 AM Ilan Ginzburg <ilans...@gmail.com> wrote:

> The changing/overlapping leaders was the main challenge in the
> implementation.
> Logic such as:
> If (iAmLeader()) {
>    doThings();
> }
> Can have multiple participants doThings() at the same time as iAmLeader()
> could change just after it was checked. The only way out in such an
> approach is to do barriers (the old leader explicitly giving up leadership
> before the new one takes over… sounds familiar? 😉). This is complicated.
> What if the old leader is considered gone and can’t explicitly give up but
> is not gone? Not being seen by the quorum does not automatically imply not
> being able to write to S3.
>
> We solved it by having the writing to S3 (or indeed any storage, we added
> an abstraction layer) use random file names (Solr file name + random
> suffix) so that two concurrent nodes would not overwrite each other even if
> they were writing similarly named segments/files.
>
> Then we used a conditional update in Zookeeper (on per write of one or more
> segments, not one per file) to have one of the two nodes “win” the write to
> S3. The data written by the losing node is ignored and not part of the S3
> image of the shard.
>
> Indeed running Solr from the local disk is essential (the cache aspect).
> Two orders of magnitude more space than in memory, more or less.
>
> And we run smaller shard sizes indeed!
>
> Thanks everybody for the feedback so far.
>
> Ilan
>
> On Sat 29 Apr 2023 at 07:08, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 4/28/23 11:33, Ilan Ginzburg wrote:
> > > Salesforce has been working for a while on separating compute from
> > storage
> > > in SolrCloud,  see presentation at Activate 2019 SolrCloud in Public
> > Cloud:
> > > Scaling Compute Independently from Storage <
> https://youtu.be/6fE5KvOfb6A
> > >.
> > > In a nutshell, the idea is that all SolrCloud nodes are stateless,
> have a
> > > local disk cache of the cores they're hosting but no persistent volumes
> > (no
> > > persistent indexes nor transaction logs), and shard level persistence
> is
> > > done on S3.
> >
> > This is a very intriguing idea!  I think it would be particularly useful
> > for containerized setups that can add or remove nodes to meet changing
> > demands.
> >
> > My primary concern when I first looked at this was that with
> > network-based storage there would be little opportunity for caching, and
> > caching is SUPER critical for Solr performance.  Then when I began
> > writing this reply, I saw above that you're talking about a local disk
> > cache... so maybe that is not something to worry about.
> >
> > Bandwidth and latency limitations to/from the shared storage are another
> > concern, especially with big indexes that have segments up to 5GB.
> > Increasing the merge tier sizes and reducing the max segment size is
> > probably a very good idea.
> >
> > Another challenge:  Ensuring that switching leaders happens reasonably
> > quickly while making sure that there cannot be multiple replicas
> > thinking they are leader at the same time.  Making the leader fencing
> > bulletproof is a critical piece of this.  I suspect that the existing
> > leader fencing could use some work, affecting SolrCloud in general.
> >
> > I don't want to get too deep in technical weeds, mostly because I do not
> > understand all the details ... but I am curious about something that
> > might affect this:  Are ephemeral znodes created by one Solr node
> > visible to other Solr nodes?  If they are, then I think ZK would provide
> > all the fencing needed, and could also keep track of the segments that
> > exist in the remote storage so follower replicas can quickly keep up
> > with the leader.
> >
> > There could also be implementations for more mundane shared storage like
> > SMB or NFS.
> >
> > Thanks,
> > Shawn
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>

Reply via email to