[
https://issues.apache.org/jira/browse/SOLR-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363174#comment-17363174
]
Roy Perkins commented on SOLR-15371:
------------------------------------
[~gerlowskija] Yes, I have verified the mount exists on all the servers in the
same location and nothing is going on in the network. I can re-run the failing
backup and see it fail over and over again until I restart the solr node that
hosts the leader for the failing shard, and then the backup completes as normal
without touching anything on the network or NFS mounts.
> Backups randomly fail sometimes
> -------------------------------
>
> Key: SOLR-15371
> URL: https://issues.apache.org/jira/browse/SOLR-15371
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Backup/Restore
> Affects Versions: 8.5.2, 8.8.2
> Reporter: Roy Perkins
> Priority: Major
>
> Hi, we have an issue where sometimes one shard fails to backup due to what
> might be a race condition in creating the folder/starting the backup. When
> this happens, we have to restart the first server in a shard to get the
> backup to succeed again. The cluster backs up to a shared NFS mount. 4/5
> times the backup goes fine without issues (there is even another collection
> that the backup will run for later in the morning that will succeed fine even
> though it's all the same servers) Below is the error I get.
> {code:java}
> "Response":"Failed to backup core=slprod_shard4_replica_n6 because
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't
> exist: file:///mnt/solr_backups/slprod/slprod-04-25-2021. Note that
> Backup/Restore of a SolrCloud collection requires a shared file system
> mounted at the same path on all nodes!"},
> {code}
> And below is the line I use to backup with (obviously with bash variables set
> earlier in the script)
> {code:java}
> curl -s
> "http://localhost:8983/solr/admin/collections?action=BACKUP&name=${COLLECTION}-${DATE}&collection=${COLLECTION}&location=${BACKUP_PATH}&async=1000"
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]