[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

Andy Vuong (Jira) Fri, 10 Jul 2020 10:45:56 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155639#comment-17155639
 ]


Andy Vuong commented on SOLR-13101:
-----------------------------------

Hey Solr Community!

I wanted to share an update on this JIRA. We've recently decided to continue 
work on this project internally for convenience and as we work through new 
challenges and re-visit our design of shared storage. Some of these challenges 
include:
 * Forcing commits and pushes to Blob on each (sub) indexing batch makes things 
expensive (paying traffic to S3) and less efficient from a SolrCloud 
perspective (too many small commits, merge cost),
 * Delaying ack to client on an indexing batch until data is indexed, segment 
is created then pushed to S3 slows things down considerably,
 * Transaction logs are used heavily in SolrCloud code. Having nodes with non 
persistent storage is challenging (for example post shard split recovery mode)

Work is progressing but we'll no longer use our feature branch for this work. 
We'll be sure to keep the community updated in the future as we progress on 
addressing these issues.

> Shared storage support in SolrCloud
> -----------------------------------
>
>                 Key: SOLR-13101
>                 URL: https://issues.apache.org/jira/browse/SOLR-13101
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
>          Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>    - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>    - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>    - don't pay for what you don't need
>    - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

Reply via email to