[ 
https://issues.apache.org/jira/browse/SOLR-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059076#comment-18059076
 ] 

Jason Gerlowski commented on SOLR-12999:
----------------------------------------

bq. I disagree with "data loss" – it's up to the leader election / "shard 
terms" to deem who is leader eligible.

Agreed - the replica doing this recovery is not leader-eligible, by definition. 
 If it was leader-eligible then it wouldn't be doing a full-index-fetch, right?

But there's still *some* data in that replica, and the code in question here 
deletes that data based on the promise/expectation that a more complete copy 
will be redownloaded "soon".  And if anything happens to the current leader, 
"soon" may never come. 🤷

If you don't like "data loss" as a way to describe that, I'm open to other 
phrases.  But hopefully we can agree that there are scenarios where this 
preemptive-delete codepath takes us from having a partial copy of the index to 
having nothing at all?  And while I don't support it or consider it a best 
practice, I've seen users rely on those "it should have most of the data" 
replicas to get themselves out of a pinch.

bq. It's even more of a shame that the implementation is 
buggy/unreliable/untested.  So I very much sympathize with removing it on that 
grounds.

Yeah - the unreliability is what drives my concern here.  It would be less of a 
"red flag" to me if it had lots of tests, if the bugs weren't 100% 
reproducible, if it was documented, etc.

> Index replication could delete segments first
> ---------------------------------------------
>
>                 Key: SOLR-12999
>                 URL: https://issues.apache.org/jira/browse/SOLR-12999
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication (java)
>            Reporter: David Smiley
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 8.1
>
>         Attachments: SOLR-12999.patch, SOLR-12999.patch
>
>
> Index replication could optionally delete files that it knows will not be 
> needed _first_.  This would reduce disk capacity requirements of Solr, and it 
> would reduce some disk fragmentation when space get tight.
> Solr (IndexFetcher) already grabs the remote file list, and it could see 
> which files it has locally, then delete the others.  Today it asks Lucene to 
> {{deleteUnusedFiles}} at the end.  This new mode would probably only be 
> useful if there is no SolrIndexSearcher open, since it would prevent the 
> removal of files.
> The motivating scenario is a SolrCloud replica that is going into full 
> recovery.  It ought to not be fielding searches.  The code changes would not 
> depend on SolrCloud though.
> This option would have some danger the user should be aware of.  If the 
> replication fails, leaving the local files incomplete/corrupt, the only 
> recourse is to try full replication again.  You can't just give up and field 
> queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to