[ 
https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682660#comment-17682660
 ] 

Colvin Cowie commented on SOLR-13396:
-------------------------------------

{quote}It seems the scenario to guard against is the wrong ZK being referenced 
(for whatever reason, to include ZK getting wiped clean). In such a scenario 
every core on disk will not be linked to a replica in ZK.
{quote}
That does sound fair in general, though I think there's still other scenarios 
where that isn't the case.

For example, if I have a test system *A* and a prod system {*}B{*}, and system 
*B* has some extra collections in it. I accidentally connect Solr *B* to 
ZooKeeper {*}A{*}, which has a subset of {*}A{*}'s collections, then the rest 
get wiped. (Granted there's other issues there as I shouldn't really be able to 
connect to prod by mistake, but it happens).
Or perhaps if I fail to properly restore a ZooKeeper backup after some sort of 
corruption?
{quote}Configuration options to toggle this are okay but it'd be nice to 
improve the default behavior because people won't find this configuration 
option and use it unless they've been burned :)
{quote}
I definitely agree. My preference would be to default to not automatically 
deleting cores, and instead set configuration to enable that. But as that's a 
behavior change, I thought I would start with a flag to disable it as being 
less contentious.

Making the code itself more sensitive to misconfiguration vs a core that was 
left behind by accident would be great as well, but I don't know the code well 
enough to address that myself. And I would still be inclined to make the 
deletion optional in any case (ideally with an opt-in), so I think there's 
value in having a config option with or without improvements to the logic?

> SolrCloud will delete the core data for any core that is not referenced in 
> the clusterstate
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13396
>                 URL: https://issues.apache.org/jira/browse/SOLR-13396
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 7.3.1, 8.0
>            Reporter: Shawn Heisey
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> SOLR-12066 is an improvement designed to delete core data for replicas that 
> were deleted while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores 
> that are not referenced in the ZK clusterstate.  If all the ZK data gets 
> deleted or the Solr instance is pointed at a ZK ensemble with no data, it 
> will proceed to delete all of the cores in the solr home, with no possibility 
> of recovery.
> I do not think that Solr should ever delete core data unless an explicit 
> DELETE action has been made and the node is operational at the time of the 
> request.  If a core exists during startup that cannot be found in the ZK 
> clusterstate, it should be ignored (not started) and a helpful message should 
> be logged.  I think that message should probably be at WARN so that it shows 
> up in the admin UI logging tab with default settings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to