Hi, This is a long message, apologies. If responses are positive, there will likely be plenty of other opportunities to discuss the topics mentioned here.
I'm trying to see if the community would be interested in a contribution allowing SolrCloud nodes to be totally stateless with persistent storage done in S3 (or similar). Salesforce has been working for a while on separating compute from storage in SolrCloud, see presentation at Activate 2019 SolrCloud in Public Cloud: Scaling Compute Independently from Storage <https://youtu.be/6fE5KvOfb6A>. In a nutshell, the idea is that all SolrCloud nodes are stateless, have a local disk cache of the cores they're hosting but no persistent volumes (no persistent indexes nor transaction logs), and shard level persistence is done on S3. This is different from a classic node remote storage: - No "per node" transaction log (therefore no need to wait for down nodes to come back up to recover their transaction log data) - A single per shard copy on S3 (not a copy per replica) - Local non persistent node volumes (SSD's for example) are a cache and are used to serve queries and do indexing (therefore local cache is not limited by memory like it is in HdfsDirectory for example but by disk size) - When adding a node and replicas, the index content for the replicas is fetched from S3, not from other (possibly overloaded already) nodes The current state of our code is: - It's a fork (it was shared as a branch for a while but eventually removed) - We introduced a new replica type that is explicitly based on remote Blob storage (and otherwise similar to TLOG) and that guards against concurrent writes to S3 (when shard leadership changes in SolrCloud, there can be an overlap period with two nodes acting as leaders which can cause corruption with a naive implementation of remote storage at shard level) - Currently we only push constructed segments to S3 which forces committing every batch before acknowledging it to the client (we can't rely on the transaction log on a node given it would be lost upon node restart, the node having no persistent storage). We're considering improving this approach by making the transaction log a shard level abstraction (rather than a replica/node abstraction), and store it in S3 as well with a transaction log per shard, not per replica. This would allow indexing to not commit on every batch, speed up /update requests, push the constructed segments asynchronously to S3, guarantee data durability while still allowing nodes to be stateless (so can be shut down at any time in any number without data loss and without having to restart these nodes to recover data only they can access). If there is interest in such a contribution to SolrCloud then we might carry the next phase of the work upstream (initially in a branch, with the intention to eventually merge). If there is no interest, it's easier for us to keep working on our fork that allows taking shortcuts and ignoring features of SolrCloud we do not use (for example replacing existing transaction log with a shard level transaction log rather than maintaining both). Feedback welcome! I can present the idea in more detail in a future committer/contributor meeting if there's interest. Thanks, Ilan