This definitely sounds very interesting and if we could abstract it away from AWS specifically then even better. I think there are a lot of advantages with an approach like this as you've mentioned. At FullStory we are planning to get into some experiments using GCP Local SSDs and Google Cloud Storage, so a very similar concept. This also would have been a popular idea in other implementations I've seen. This is one of those ideas that if it was relatively easy to implement for users I think it would get a lot of adoption.
I fully support this concept as a contribution and would be happy to help make that a reality. On Fri, Apr 28, 2023 at 4:27 PM Joel Bernstein <joels...@gmail.com> wrote: > I mentioned NIO providers for S3, GCS and Azure in a different email > thread. This could be used to abstract away the S3 specific code and > provide support for GCS and Azure without much more effort. This would make > unit tests much easier to write because you can simply unit test to local > disk by changing the URL scheme. > > The current design as I understand it only needs sequential reads and > writes, as its just pushing segments to remote storage. So I think NIO > providers should support the basic needs. > > I think the design sounds really interesting and is a nice step in > modernizing SolrCloud. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Fri, Apr 28, 2023 at 4:03 PM David Smiley <dsmi...@apache.org> wrote: > > > To clarify the point to everyone: "separation of compute from storage" > > allows infrastructure cost savings for when you have both large scale > (many > > shards in the cluster) and highly diverse collection/index utilization. > > The vision of our contribution is that an unused shard can scale down to > as > > little as zero replicas if you are willing for a future request to be > > delayed for S3 hydration. The implementation also has SolrCloud > robostuness > > advantages compared to NRT/TLOG replicas. This is "the vision"; we > haven't > > achieved zero replicas yet and maybe partially realized stability > > advantages, and of course we've got some issues to address. Nevertheless > > with our continued investment in this approach (probable but not > certain), > > that's where we'll be. > > > > I'm sure a number of users/businesses would find these advantages > > desirable. Maybe not most users but some. Small data sets or ones that > > are constantly being used might not see the primary advantage. > > > > Sadly, we did not look at HdfsDirectory with an S3 backend because we > > completely missed the possibility of doing so. I wish this was promoted > > more, like in the documentation / ref guide. HdfsDirectory might achieve > > the aforementioned goals albeit with varying pros/cons to the proposed > > contribution. Also, statements about what the contribution does and > could > > do is a moving target; meanwhile HdfsDirectory has not seen investment in > > features/optimizations, but I can easily imagine some! > > > > With my Apache hat on, my main concern with the proposed contribution is > > hearing opinions on how well it fits into SolrCloud / the codebase and > the > > resulting maintenance as code owners. That might be hard for anyone to > > comment on right now because you can't see it but if there are > preliminary > > thoughts on what Ilan has written in this regard, then, please share > them. > > If we don't ultimately contribute the aforementioned functionality, I > > wonder if the developer community here might agree with the replica type > > enum moving to an actual abstraction (i.e. an interface) to allow for the > > possibility of other replica types for Solr hackers like us, and perhaps > > it'd improve SolrCloud code in so doing. > > > > A more concrete concern I have is how to limit dependencies. The > > contribution adds AWS S3 dependencies to solr-core but that's not at all > > okay in Apache Solr. > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > > > > > On Fri, Apr 28, 2023 at 1:33 PM Ilan Ginzburg <ilans...@gmail.com> > wrote: > > > > > Hi, > > > > > > This is a long message, apologies. If responses are positive, there > will > > > likely be plenty of other opportunities to discuss the topics mentioned > > > here. > > > > > > I'm trying to see if the community would be interested in a > contribution > > > allowing SolrCloud nodes to be totally stateless with persistent > storage > > > done in S3 (or similar). > > > > > > Salesforce has been working for a while on separating compute from > > storage > > > in SolrCloud, see presentation at Activate 2019 SolrCloud in Public > > Cloud: > > > Scaling Compute Independently from Storage < > https://youtu.be/6fE5KvOfb6A > > >. > > > In a nutshell, the idea is that all SolrCloud nodes are stateless, > have a > > > local disk cache of the cores they're hosting but no persistent volumes > > (no > > > persistent indexes nor transaction logs), and shard level persistence > is > > > done on S3. > > > > > > This is different from a classic node remote storage: > > > - No "per node" transaction log (therefore no need to wait for down > nodes > > > to come back up to recover their transaction log data) > > > - A single per shard copy on S3 (not a copy per replica) > > > - Local non persistent node volumes (SSD's for example) are a cache and > > are > > > used to serve queries and do indexing (therefore local cache is not > > limited > > > by memory like it is in HdfsDirectory for example but by disk size) > > > - When adding a node and replicas, the index content for the replicas > is > > > fetched from S3, not from other (possibly overloaded already) nodes > > > > > > The current state of our code is: > > > - It's a fork (it was shared as a branch for a while but eventually > > > removed) > > > - We introduced a new replica type that is explicitly based on remote > > Blob > > > storage (and otherwise similar to TLOG) and that guards against > > concurrent > > > writes to S3 (when shard leadership changes in SolrCloud, there can be > an > > > overlap period with two nodes acting as leaders which can cause > > corruption > > > with a naive implementation of remote storage at shard level) > > > - Currently we only push constructed segments to S3 which forces > > committing > > > every batch before acknowledging it to the client (we can't rely on the > > > transaction log on a node given it would be lost upon node restart, the > > > node having no persistent storage). > > > > > > We're considering improving this approach by making the transaction > log a > > > shard level abstraction (rather than a replica/node abstraction), and > > store > > > it in S3 as well with a transaction log per shard, not per replica. > > > This would allow indexing to not commit on every batch, speed up > /update > > > requests, push the constructed segments asynchronously to S3, guarantee > > > data durability while still allowing nodes to be stateless (so can be > > shut > > > down at any time in any number without data loss and without having to > > > restart these nodes to recover data only they can access). > > > > > > If there is interest in such a contribution to SolrCloud then we might > > > carry the next phase of the work upstream (initially in a branch, with > > the > > > intention to eventually merge). > > > > > > If there is no interest, it's easier for us to keep working on our > > > fork that allows taking shortcuts and ignoring features of SolrCloud we > > do > > > not use (for example replacing existing transaction log with a shard > > level > > > transaction log rather than maintaining both). > > > > > > Feedback welcome! I can present the idea in more detail in a future > > > committer/contributor meeting if there's interest. > > > > > > Thanks, > > > Ilan > > > > > >