[
https://issues.apache.org/jira/browse/SOLR-13396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682660#comment-17682660
]
Colvin Cowie commented on SOLR-13396:
-------------------------------------
{quote}It seems the scenario to guard against is the wrong ZK being referenced
(for whatever reason, to include ZK getting wiped clean). In such a scenario
every core on disk will not be linked to a replica in ZK.
{quote}
That does sound fair in general, though I think there's still other scenarios
where that isn't the case.
For example, if I have a test system *A* and a prod system {*}B{*}, and system
*B* has some extra collections in it. I accidentally connect Solr *B* to
ZooKeeper {*}A{*}, which has a subset of {*}A{*}'s collections, then the rest
get wiped. (Granted there's other issues there as I shouldn't really be able to
connect to prod by mistake, but it happens).
Or perhaps if I fail to properly restore a ZooKeeper backup after some sort of
corruption?
{quote}Configuration options to toggle this are okay but it'd be nice to
improve the default behavior because people won't find this configuration
option and use it unless they've been burned :)
{quote}
I definitely agree. My preference would be to default to not automatically
deleting cores, and instead set configuration to enable that. But as that's a
behavior change, I thought I would start with a flag to disable it as being
less contentious.
Making the code itself more sensitive to misconfiguration vs a core that was
left behind by accident would be great as well, but I don't know the code well
enough to address that myself. And I would still be inclined to make the
deletion optional in any case (ideally with an opt-in), so I think there's
value in having a config option with or without improvements to the logic?
> SolrCloud will delete the core data for any core that is not referenced in
> the clusterstate
> -------------------------------------------------------------------------------------------
>
> Key: SOLR-13396
> URL: https://issues.apache.org/jira/browse/SOLR-13396
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 7.3.1, 8.0
> Reporter: Shawn Heisey
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> SOLR-12066 is an improvement designed to delete core data for replicas that
> were deleted while the node was down -- better cleanup.
> In practice, that change causes SolrCloud to delete all core data for cores
> that are not referenced in the ZK clusterstate. If all the ZK data gets
> deleted or the Solr instance is pointed at a ZK ensemble with no data, it
> will proceed to delete all of the cores in the solr home, with no possibility
> of recovery.
> I do not think that Solr should ever delete core data unless an explicit
> DELETE action has been made and the node is operational at the time of the
> request. If a core exists during startup that cannot be found in the ZK
> clusterstate, it should be ignored (not started) and a helpful message should
> be logged. I think that message should probably be at WARN so that it shows
> up in the admin UI logging tab with default settings.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]