Tódor Dávid Nyeste created SOLR-16888:
-----------------------------------------

             Summary: Data corruption causes snapshots with previously used 
names to be unable to be created/deleted
                 Key: SOLR-16888
                 URL: https://issues.apache.org/jira/browse/SOLR-16888
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: 8.4.1
            Reporter: Tódor Dávid Nyeste


Solr snapshot metadata is stored on the fs, and zookeeper stores information 
about those as well. If for some reason zookeeper's data gets corrupted that 
causes the snapshot to desync. No further snapshots can be created with the 
same name, since 
[CreateSnapshotCmd|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateSnapshotCmd.java]
 begins the creation of a new zookeeper entry, but it fails at the 
[SolrSnapshotMetaDataManager|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/core/src/java/org/apache/solr/core/snapshots/SolrSnapshotMetaDataManager.java#L178]'s
 snapshot() function, since the fs already contains a snapshot with this name, 
and the nameToDetailsMapping map obtains it's data from there. Attempting to 
delete this snapshot with 
[DeleteSnapshotCmd|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/core/src/java/org/apache/solr/cloud/api/collections/DeleteSnapshotCmd.java]
 successfully removes the zookeeper entry, however since it's a failed snapshot 
with no cores assigned it's unable to remove the metadata on the fs (but still 
returns a misleading "successfully deleted" message).


A possible workaround is to manually delete the metadata on the fs, but if 
there are more than one snapshots for the collection then editing the binary 
file to remove the corrupt entry becomes complicated. Would adding a "force 
delete" command, which scanned all the cores assigned to the collection for 
snapshot metadata with the given name, and delete them, be feasible? Or 
alternatively, would it be possible to improve the logging to represent the 
situation more accurately? For example, giving a warning at deletion that the 
snapshot is a failed one with no cores assigned to it, therefore it won't 
delete any fs metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to