[ 
https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816343#comment-16816343
 ] 

Erick Erickson commented on SOLR-13396:
---------------------------------------

This is a sticky wicket. Let's claim I have a 200 node cluster hosting 1,000 
collections. Keeping track of all the cores that aren't _really_ part of a 
collection and manually cleaning them up is an onerous task.

Yet it's pretty horrible to have one mistake (someone edits the startup script 
and messes up the ZK parameter and pushes it out to all the Solr nodes and 
restarts the cluster) one could delete everything everywhere.

More thinking out loud, and I have no clue how it'd interact with autoscaling. 
It seems odd but we _could_ use ZooKeeper to keep a list of potential nodes to 
delete and have

1> a way to view/list them

2> a button to push or a collections API command to issue or.. to say "delete 
them".

3> some kind of very visible warning that this list is not empty.

"But wait!!" you cry, The whole problem is that you can't get to ZooKeeper in 
the first place!" Which is perfectly fine, since we're presupposing a bogus ZK 
address anyway. That way the nodes to delete would be tied to the proper ZK 
instance. When the ZK address was corrected, there wouldn't be anything in the 
queue. I think I like this a little better than some sort of 
scheduled-in-the-future event, for people who cared a cron job that issued the 
collections API call could be done. One could even attach a date to the znode 
for the potential core to delete with an expiration date.

 

> SolrCloud will delete the core data for any core that is not referenced in 
> the clusterstate
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13396
>                 URL: https://issues.apache.org/jira/browse/SOLR-13396
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.3.1, 8.0
>            Reporter: Shawn Heisey
>            Priority: Major
>
> SOLR-12066 is an improvement designed to delete core data for replicas that 
> were deleted while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores 
> that are not referenced in the ZK clusterstate.  If all the ZK data gets 
> deleted or the Solr instance is pointed at a ZK ensemble with no data, it 
> will proceed to delete all of the cores in the solr home, with no possibility 
> of recovery.
> I do not think that Solr should ever delete core data unless an explicit 
> DELETE action has been made and the node is operational at the time of the 
> request.  If a core exists during startup that cannot be found in the ZK 
> clusterstate, it should be ignored (not started) and a helpful message should 
> be logged.  I think that message should probably be at WARN so that it shows 
> up in the admin UI logging tab with default settings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to