[ 
https://issues.apache.org/jira/browse/SOLR-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263440#comment-15263440
 ] 

David Smiley commented on SOLR-9038:
------------------------------------

I _think_ we may be understanding each other again.  We might not want to call 
this a snapshot simply because there are remnants of that naming within 
replication & backup (e.g. "Snapshooter").  Instead I propose naming it closer 
to what it actually is implemented as -- like a "commit lease" or "snapshot 
commit" (the operative word being "commit").  Of course others may want to 
comment; I have no conviction.  For now lets continue with "snapshot commit" as 
it retains both words, and is a decent name, I think.

bq. As you mentioned in your earlier comments, we can use the "commit" workflow 
to create a named snapshot.
bq. Does that make sense?

Yes!

bq. How would the "list snapshots" and "delete snapshot" APIs look like? Do we 
need to provide them just at the core level or at the collection level as well?

I think the data to be listed is fundamentally at the core, so certainly the 
core level.  But a collection level API is needed -- it could simply take the 
distinct union list from asking each leader.  It could list the snapshot 
commits _not_ common to all in a separate list, if there's any utility in that?

bq. Would we allow "destructive" operations (e.g. delete replica/shard) when we 
have one or more snapshots?

I think so.  Not doing so might be a pain, and it's not evident to me it's 
important to worry about it.

bq. It seems to me that the "commit" request will be executed by all replicas 
for a given collection. What should happen when a "commit" request can not be 
processed by a replica (since it may be down) ? We may need to ensure that 
during the replica "recovery" it also fetches the information about commit 
metadata.

Hmm; good point. :-(  That might be a PITA unfortunately.  Perhaps a snapshot 
commit needs to block for all replicas to _not_ be in recovery first?  That 
seems much easier than trying to get replicas in recovery to somehow get 
IndexCommit data which I think is kinda impossible / infeasible.  However, 
another bad situation is when there are already successful snapshot commits, 
and then for whatever reason a replica goes into recovery -- full recovery, and 
thus only grabs the latest commit (which might not even be a snapshot commit.  
So perhaps recovering replicas need to ask to replicate not just the latest 
commit but all snapshot commits as well.  Seems pretty doable.  One would hope 
that the commits would share lots of big segments, but they might not.  I don't 
think this scenario would block an initial release.  Possible but too bad.


> Ability to create/delete/list snapshots for a solr collection
> -------------------------------------------------------------
>
>                 Key: SOLR-9038
>                 URL: https://issues.apache.org/jira/browse/SOLR-9038
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Hrishikesh Gadre
>
> Currently work is under-way to implement backup/restore API for Solr cloud 
> (SOLR-5750). SOLR-5750 is about providing an ability to "copy" index files 
> and collection metadata to a configurable location. 
> In addition to this, we should also provide a facility to create "named" 
> snapshots for Solr collection. Here by "snapshot" I mean configuring the 
> underlying Lucene IndexDeletionPolicy to not delete a specific commit point 
> (e.g. using PersistentSnapshotIndexDeletionPolicy). This should not be 
> confused with SOLR-5340 which implements core level "backup" functionality.
> The primary motivation of this feature is to decouple recording/preserving a 
> known consistent state of a collection from actually "copying" the relevant 
> files to a physically separate location. This decoupling have number of 
> advantages
> - We can use specialized data-copying tools for transferring Solr index 
> files. e.g. in Hadoop environment, typically 
> [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html] tool is used to 
> copy files from one location to other. This tool provides various options to 
> configure degree of parallelism, bandwidth usage as well as integration with 
> different types and versions of file systems (e.g. AWS S3, Azure Blob store 
> etc.)
> - This separation of concern would also help Solr to focus on the key 
> functionality (i.e. querying and indexing) while delegating the copy 
> operation to the tools built for that purpose.
> - Users can decide if/when to copy the data files as against creating a 
> snapshot. e.g. a user may want to create a snapshot of a collection before 
> making an experimental change (e.g. updating/deleting docs, schema change 
> etc.). If the experiment is successful, he can delete the snapshot (without 
> having to copy the files). If the experiment is failed, then he can copy the 
> files associated with the snapshot and restore.
> Note that Apache Blur project is also providing a similar feature 
> [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to