+1 - sounds very promising! On Sat, Apr 29, 2023 at 1:06 PM Wei <weiwan...@gmail.com> wrote: > > This is an awesome feature for solr cloud! Currently for our read > heavy/write heavy use case, we exclude all query requests from the leader > in each shard to avoid becoming the load bottleneck. Also each solr cloud > has its own pipeline for NRT updates. With stateless replica and > persistent storage support, can one small cluster be dedicated to handle > the updates, while many serving clouds all pull the updated segments from > central persistent storage? It would be significant resoure savings. > > Thanks, > Wei > > On Sat, Apr 29, 2023 at 2:30 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > > > The changing/overlapping leaders was the main challenge in the > > implementation. > > Logic such as: > > If (iAmLeader()) { > > doThings(); > > } > > Can have multiple participants doThings() at the same time as iAmLeader() > > could change just after it was checked. The only way out in such an > > approach is to do barriers (the old leader explicitly giving up leadership > > before the new one takes over… sounds familiar? 😉). This is complicated. > > What if the old leader is considered gone and can’t explicitly give up but > > is not gone? Not being seen by the quorum does not automatically imply not > > being able to write to S3. > > > > We solved it by having the writing to S3 (or indeed any storage, we added > > an abstraction layer) use random file names (Solr file name + random > > suffix) so that two concurrent nodes would not overwrite each other even if > > they were writing similarly named segments/files. > > > > Then we used a conditional update in Zookeeper (on per write of one or more > > segments, not one per file) to have one of the two nodes “win” the write to > > S3. The data written by the losing node is ignored and not part of the S3 > > image of the shard. > > > > Indeed running Solr from the local disk is essential (the cache aspect). > > Two orders of magnitude more space than in memory, more or less. > > > > And we run smaller shard sizes indeed! > > > > Thanks everybody for the feedback so far. > > > > Ilan > > > > On Sat 29 Apr 2023 at 07:08, Shawn Heisey <apa...@elyograg.org> wrote: > > > > > On 4/28/23 11:33, Ilan Ginzburg wrote: > > > > Salesforce has been working for a while on separating compute from > > > storage > > > > in SolrCloud, see presentation at Activate 2019 SolrCloud in Public > > > Cloud: > > > > Scaling Compute Independently from Storage < > > https://youtu.be/6fE5KvOfb6A > > > >. > > > > In a nutshell, the idea is that all SolrCloud nodes are stateless, > > have a > > > > local disk cache of the cores they're hosting but no persistent volumes > > > (no > > > > persistent indexes nor transaction logs), and shard level persistence > > is > > > > done on S3. > > > > > > This is a very intriguing idea! I think it would be particularly useful > > > for containerized setups that can add or remove nodes to meet changing > > > demands. > > > > > > My primary concern when I first looked at this was that with > > > network-based storage there would be little opportunity for caching, and > > > caching is SUPER critical for Solr performance. Then when I began > > > writing this reply, I saw above that you're talking about a local disk > > > cache... so maybe that is not something to worry about. > > > > > > Bandwidth and latency limitations to/from the shared storage are another > > > concern, especially with big indexes that have segments up to 5GB. > > > Increasing the merge tier sizes and reducing the max segment size is > > > probably a very good idea. > > > > > > Another challenge: Ensuring that switching leaders happens reasonably > > > quickly while making sure that there cannot be multiple replicas > > > thinking they are leader at the same time. Making the leader fencing > > > bulletproof is a critical piece of this. I suspect that the existing > > > leader fencing could use some work, affecting SolrCloud in general. > > > > > > I don't want to get too deep in technical weeds, mostly because I do not > > > understand all the details ... but I am curious about something that > > > might affect this: Are ephemeral znodes created by one Solr node > > > visible to other Solr nodes? If they are, then I think ZK would provide > > > all the fencing needed, and could also keep track of the segments that > > > exist in the remote storage so follower replicas can quickly keep up > > > with the leader. > > > > > > There could also be implementations for more mundane shared storage like > > > SMB or NFS. > > > > > > Thanks, > > > Shawn > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > > > For additional commands, e-mail: dev-h...@solr.apache.org > > > > > > > >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org