+1 - sounds very promising!

On Sat, Apr 29, 2023 at 1:06 PM Wei <weiwan...@gmail.com> wrote:
>
> This is an awesome feature for solr cloud! Currently for our read
> heavy/write heavy use case, we exclude all query requests from the leader
> in each shard to avoid becoming the load bottleneck. Also each solr cloud
> has its own pipeline for NRT updates.  With stateless replica and
> persistent storage support, can one small cluster be dedicated to handle
> the updates, while many serving clouds all pull the updated segments from
> central persistent storage? It would be significant resoure savings.
>
> Thanks,
> Wei
>
> On Sat, Apr 29, 2023 at 2:30 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
>
> > The changing/overlapping leaders was the main challenge in the
> > implementation.
> > Logic such as:
> > If (iAmLeader()) {
> >    doThings();
> > }
> > Can have multiple participants doThings() at the same time as iAmLeader()
> > could change just after it was checked. The only way out in such an
> > approach is to do barriers (the old leader explicitly giving up leadership
> > before the new one takes over… sounds familiar? 😉). This is complicated.
> > What if the old leader is considered gone and can’t explicitly give up but
> > is not gone? Not being seen by the quorum does not automatically imply not
> > being able to write to S3.
> >
> > We solved it by having the writing to S3 (or indeed any storage, we added
> > an abstraction layer) use random file names (Solr file name + random
> > suffix) so that two concurrent nodes would not overwrite each other even if
> > they were writing similarly named segments/files.
> >
> > Then we used a conditional update in Zookeeper (on per write of one or more
> > segments, not one per file) to have one of the two nodes “win” the write to
> > S3. The data written by the losing node is ignored and not part of the S3
> > image of the shard.
> >
> > Indeed running Solr from the local disk is essential (the cache aspect).
> > Two orders of magnitude more space than in memory, more or less.
> >
> > And we run smaller shard sizes indeed!
> >
> > Thanks everybody for the feedback so far.
> >
> > Ilan
> >
> > On Sat 29 Apr 2023 at 07:08, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> > > On 4/28/23 11:33, Ilan Ginzburg wrote:
> > > > Salesforce has been working for a while on separating compute from
> > > storage
> > > > in SolrCloud,  see presentation at Activate 2019 SolrCloud in Public
> > > Cloud:
> > > > Scaling Compute Independently from Storage <
> > https://youtu.be/6fE5KvOfb6A
> > > >.
> > > > In a nutshell, the idea is that all SolrCloud nodes are stateless,
> > have a
> > > > local disk cache of the cores they're hosting but no persistent volumes
> > > (no
> > > > persistent indexes nor transaction logs), and shard level persistence
> > is
> > > > done on S3.
> > >
> > > This is a very intriguing idea!  I think it would be particularly useful
> > > for containerized setups that can add or remove nodes to meet changing
> > > demands.
> > >
> > > My primary concern when I first looked at this was that with
> > > network-based storage there would be little opportunity for caching, and
> > > caching is SUPER critical for Solr performance.  Then when I began
> > > writing this reply, I saw above that you're talking about a local disk
> > > cache... so maybe that is not something to worry about.
> > >
> > > Bandwidth and latency limitations to/from the shared storage are another
> > > concern, especially with big indexes that have segments up to 5GB.
> > > Increasing the merge tier sizes and reducing the max segment size is
> > > probably a very good idea.
> > >
> > > Another challenge:  Ensuring that switching leaders happens reasonably
> > > quickly while making sure that there cannot be multiple replicas
> > > thinking they are leader at the same time.  Making the leader fencing
> > > bulletproof is a critical piece of this.  I suspect that the existing
> > > leader fencing could use some work, affecting SolrCloud in general.
> > >
> > > I don't want to get too deep in technical weeds, mostly because I do not
> > > understand all the details ... but I am curious about something that
> > > might affect this:  Are ephemeral znodes created by one Solr node
> > > visible to other Solr nodes?  If they are, then I think ZK would provide
> > > all the fencing needed, and could also keep track of the segments that
> > > exist in the remote storage so follower replicas can quickly keep up
> > > with the leader.
> > >
> > > There could also be implementations for more mundane shared storage like
> > > SMB or NFS.
> > >
> > > Thanks,
> > > Shawn
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > > For additional commands, e-mail: dev-h...@solr.apache.org
> > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to