Re: SolrCloud separating compute from storage

Justin Sweeney Fri, 28 Apr 2023 19:56:12 -0700

This definitely sounds very interesting and if we could abstract it away
from AWS specifically then even better. I think there are a lot of
advantages with an approach like this as you've mentioned. At FullStory we
are planning to get into some experiments using GCP Local SSDs and Google
Cloud Storage, so a very similar concept. This also would have been a
popular idea in other implementations I've seen. This is one of those ideas
that if it was relatively easy to implement for users I think it would get
a lot of adoption.


I fully support this concept as a contribution and would be happy to help
make that a reality.

On Fri, Apr 28, 2023 at 4:27 PM Joel Bernstein <joels...@gmail.com> wrote:

> I mentioned NIO providers for S3, GCS and Azure in a different email
> thread. This could be used to abstract away the S3 specific code and
> provide support for GCS and Azure without much more effort. This would make
> unit tests much easier to write because you can simply unit test to local
> disk by changing the URL scheme.
>
> The current design as I understand it only needs sequential reads and
> writes, as its just pushing segments to remote storage. So I think NIO
> providers should support the basic needs.
>
> I think the design sounds really interesting and is a nice step in
> modernizing SolrCloud.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Apr 28, 2023 at 4:03 PM David Smiley <dsmi...@apache.org> wrote:
>
> > To clarify the point to everyone: "separation of compute from storage"
> > allows infrastructure cost savings for when you have both large scale
> (many
> > shards in the cluster) and highly diverse collection/index utilization.
> > The vision of our contribution is that an unused shard can scale down to
> as
> > little as zero replicas if you are willing for a future request to be
> > delayed for S3 hydration. The implementation also has SolrCloud
> robostuness
> > advantages compared to NRT/TLOG replicas.  This is "the vision"; we
> haven't
> > achieved zero replicas yet and maybe partially realized stability
> > advantages, and of course we've got some issues to address.  Nevertheless
> > with our continued investment in this approach (probable but not
> certain),
> > that's where we'll be.
> >
> > I'm sure a number of users/businesses would find these advantages
> > desirable.  Maybe not most users but some.  Small data sets or ones that
> > are constantly being used might not see the primary advantage.
> >
> > Sadly, we did not look at HdfsDirectory with an S3 backend because we
> > completely missed the possibility of doing so.  I wish this was promoted
> > more, like in the documentation / ref guide.  HdfsDirectory might achieve
> > the aforementioned goals albeit with varying pros/cons to the proposed
> > contribution.  Also, statements about what the contribution does and
> could
> > do is a moving target; meanwhile HdfsDirectory has not seen investment in
> > features/optimizations, but I can easily imagine some!
> >
> > With my Apache hat on, my main concern with the proposed contribution is
> > hearing opinions on how well it fits into SolrCloud / the codebase and
> the
> > resulting maintenance as code owners.  That might be hard for anyone to
> > comment on right now because you can't see it but if there are
> preliminary
> > thoughts on what Ilan has written in this regard, then, please share
> them.
> > If we don't ultimately contribute the aforementioned functionality, I
> > wonder if the developer community here might agree with the replica type
> > enum moving to an actual abstraction (i.e. an interface) to allow for the
> > possibility of other replica types for Solr hackers like us, and perhaps
> > it'd improve SolrCloud code in so doing.
> >
> > A more concrete concern I have is how to limit dependencies.  The
> > contribution adds AWS S3 dependencies to solr-core but that's not at all
> > okay in Apache Solr.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Apr 28, 2023 at 1:33 PM Ilan Ginzburg <ilans...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > This is a long message, apologies. If responses are positive, there
> will
> > > likely be plenty of other opportunities to discuss the topics mentioned
> > > here.
> > >
> > > I'm trying to see if the community would be interested in a
> contribution
> > > allowing SolrCloud nodes to be totally stateless with persistent
> storage
> > > done in S3 (or similar).
> > >
> > > Salesforce has been working for a while on separating compute from
> > storage
> > > in SolrCloud,  see presentation at Activate 2019 SolrCloud in Public
> > Cloud:
> > > Scaling Compute Independently from Storage <
> https://youtu.be/6fE5KvOfb6A
> > >.
> > > In a nutshell, the idea is that all SolrCloud nodes are stateless,
> have a
> > > local disk cache of the cores they're hosting but no persistent volumes
> > (no
> > > persistent indexes nor transaction logs), and shard level persistence
> is
> > > done on S3.
> > >
> > > This is different from a classic node remote storage:
> > > - No "per node" transaction log (therefore no need to wait for down
> nodes
> > > to come back up to recover their transaction log data)
> > > - A single per shard copy on S3 (not a copy per replica)
> > > - Local non persistent node volumes (SSD's for example) are a cache and
> > are
> > > used to serve queries and do indexing (therefore local cache is not
> > limited
> > > by memory like it is in HdfsDirectory for example but by disk size)
> > > - When adding a node and replicas, the index content for the replicas
> is
> > > fetched from S3, not from other (possibly overloaded already) nodes
> > >
> > > The current state of our code is:
> > > - It's a fork (it was shared as a branch for a while but eventually
> > > removed)
> > > - We introduced a new replica type that is explicitly based on remote
> > Blob
> > > storage (and otherwise similar to TLOG) and that guards against
> > concurrent
> > > writes to S3 (when shard leadership changes in SolrCloud, there can be
> an
> > > overlap period with two nodes acting as leaders which can cause
> > corruption
> > > with a naive implementation of remote storage at shard level)
> > > - Currently we only push constructed segments to S3 which forces
> > committing
> > > every batch before acknowledging it to the client (we can't rely on the
> > > transaction log on a node given it would be lost upon node restart, the
> > > node having no persistent storage).
> > >
> > > We're considering improving this approach by making the transaction
> log a
> > > shard level abstraction (rather than a replica/node abstraction), and
> > store
> > > it in S3 as well with a transaction log per shard, not per replica.
> > > This would allow indexing to not commit on every batch, speed up
> /update
> > > requests, push the constructed segments asynchronously to S3, guarantee
> > > data durability while still allowing nodes to be stateless (so can be
> > shut
> > > down at any time in any number without data loss and without having to
> > > restart these nodes to recover data only they can access).
> > >
> > > If there is interest in such a contribution to SolrCloud then we might
> > > carry the next phase of the work upstream (initially in a branch, with
> > the
> > > intention to eventually merge).
> > >
> > > If there is no interest, it's easier for us to keep working on our
> > > fork that allows taking shortcuts and ignoring features of SolrCloud we
> > do
> > > not use (for example replacing existing transaction log with a shard
> > level
> > > transaction log rather than maintaining both).
> > >
> > > Feedback welcome! I can present the idea in more detail in a future
> > > committer/contributor meeting if there's interest.
> > >
> > > Thanks,
> > > Ilan
> > >
> >
>

Re: SolrCloud separating compute from storage

Reply via email to