I mentioned NIO providers for S3, GCS and Azure in a different email thread. This could be used to abstract away the S3 specific code and provide support for GCS and Azure without much more effort. This would make unit tests much easier to write because you can simply unit test to local disk by changing the URL scheme.
The current design as I understand it only needs sequential reads and writes, as its just pushing segments to remote storage. So I think NIO providers should support the basic needs. I think the design sounds really interesting and is a nice step in modernizing SolrCloud. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Apr 28, 2023 at 4:03 PM David Smiley <dsmi...@apache.org> wrote: > To clarify the point to everyone: "separation of compute from storage" > allows infrastructure cost savings for when you have both large scale (many > shards in the cluster) and highly diverse collection/index utilization. > The vision of our contribution is that an unused shard can scale down to as > little as zero replicas if you are willing for a future request to be > delayed for S3 hydration. The implementation also has SolrCloud robostuness > advantages compared to NRT/TLOG replicas. This is "the vision"; we haven't > achieved zero replicas yet and maybe partially realized stability > advantages, and of course we've got some issues to address. Nevertheless > with our continued investment in this approach (probable but not certain), > that's where we'll be. > > I'm sure a number of users/businesses would find these advantages > desirable. Maybe not most users but some. Small data sets or ones that > are constantly being used might not see the primary advantage. > > Sadly, we did not look at HdfsDirectory with an S3 backend because we > completely missed the possibility of doing so. I wish this was promoted > more, like in the documentation / ref guide. HdfsDirectory might achieve > the aforementioned goals albeit with varying pros/cons to the proposed > contribution. Also, statements about what the contribution does and could > do is a moving target; meanwhile HdfsDirectory has not seen investment in > features/optimizations, but I can easily imagine some! > > With my Apache hat on, my main concern with the proposed contribution is > hearing opinions on how well it fits into SolrCloud / the codebase and the > resulting maintenance as code owners. That might be hard for anyone to > comment on right now because you can't see it but if there are preliminary > thoughts on what Ilan has written in this regard, then, please share them. > If we don't ultimately contribute the aforementioned functionality, I > wonder if the developer community here might agree with the replica type > enum moving to an actual abstraction (i.e. an interface) to allow for the > possibility of other replica types for Solr hackers like us, and perhaps > it'd improve SolrCloud code in so doing. > > A more concrete concern I have is how to limit dependencies. The > contribution adds AWS S3 dependencies to solr-core but that's not at all > okay in Apache Solr. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Fri, Apr 28, 2023 at 1:33 PM Ilan Ginzburg <ilans...@gmail.com> wrote: > > > Hi, > > > > This is a long message, apologies. If responses are positive, there will > > likely be plenty of other opportunities to discuss the topics mentioned > > here. > > > > I'm trying to see if the community would be interested in a contribution > > allowing SolrCloud nodes to be totally stateless with persistent storage > > done in S3 (or similar). > > > > Salesforce has been working for a while on separating compute from > storage > > in SolrCloud, see presentation at Activate 2019 SolrCloud in Public > Cloud: > > Scaling Compute Independently from Storage <https://youtu.be/6fE5KvOfb6A > >. > > In a nutshell, the idea is that all SolrCloud nodes are stateless, have a > > local disk cache of the cores they're hosting but no persistent volumes > (no > > persistent indexes nor transaction logs), and shard level persistence is > > done on S3. > > > > This is different from a classic node remote storage: > > - No "per node" transaction log (therefore no need to wait for down nodes > > to come back up to recover their transaction log data) > > - A single per shard copy on S3 (not a copy per replica) > > - Local non persistent node volumes (SSD's for example) are a cache and > are > > used to serve queries and do indexing (therefore local cache is not > limited > > by memory like it is in HdfsDirectory for example but by disk size) > > - When adding a node and replicas, the index content for the replicas is > > fetched from S3, not from other (possibly overloaded already) nodes > > > > The current state of our code is: > > - It's a fork (it was shared as a branch for a while but eventually > > removed) > > - We introduced a new replica type that is explicitly based on remote > Blob > > storage (and otherwise similar to TLOG) and that guards against > concurrent > > writes to S3 (when shard leadership changes in SolrCloud, there can be an > > overlap period with two nodes acting as leaders which can cause > corruption > > with a naive implementation of remote storage at shard level) > > - Currently we only push constructed segments to S3 which forces > committing > > every batch before acknowledging it to the client (we can't rely on the > > transaction log on a node given it would be lost upon node restart, the > > node having no persistent storage). > > > > We're considering improving this approach by making the transaction log a > > shard level abstraction (rather than a replica/node abstraction), and > store > > it in S3 as well with a transaction log per shard, not per replica. > > This would allow indexing to not commit on every batch, speed up /update > > requests, push the constructed segments asynchronously to S3, guarantee > > data durability while still allowing nodes to be stateless (so can be > shut > > down at any time in any number without data loss and without having to > > restart these nodes to recover data only they can access). > > > > If there is interest in such a contribution to SolrCloud then we might > > carry the next phase of the work upstream (initially in a branch, with > the > > intention to eventually merge). > > > > If there is no interest, it's easier for us to keep working on our > > fork that allows taking shortcuts and ignoring features of SolrCloud we > do > > not use (for example replacing existing transaction log with a shard > level > > transaction log rather than maintaining both). > > > > Feedback welcome! I can present the idea in more detail in a future > > committer/contributor meeting if there's interest. > > > > Thanks, > > Ilan > > >