[ 
https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527505#comment-16527505
 ] 

Jan Høydahl edited comment on SOLR-12523 at 6/29/18 11:57 AM:
--------------------------------------------------------------

Tested the patch, and it passes precommit. Here's the new error text from the 
API when attempting a backup across two nodes that do NOT share the backup 
drive. The same error will also be logged in the logs on both nodes. Will 
commit soon.
{noformat}
{
"responseHeader": {
"status": 500, "QTime": 135
}, "failure": {
"10.5.0.5:8983_solr": 
"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error 
from server at http://10.5.0.5:8983/solr: Failed to backup 
core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: file:///back/myback2. Backup 
folder must already exist. Note also that Backup/Restore of a SolrCloud 
collection requires a shared file system mounted at the same path on all nodes!"
}, "Operation backup caused exception:": 
"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not backup all replicas", "exception": {
"msg": "Could not backup all replicas", "rspCode": 500
}, "error": {
"metadata": [
"error-class", "org.apache.solr.common.SolrException", "root-error-class", 
"org.apache.solr.common.SolrException"
], "msg": "Could not backup all replicas", "trace": 
"org.apache.solr.common.SolrException: Could not backup all replicas\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
 
[...snip...]
}
}{noformat}
One unrelated observation here. Part of the error response says: *Could not 
backup all replicas*. While we might call any core a "replica", it would 
perhaps in the context of a collection backup be more precise to say *Could not 
backup all shards*? [~markrmil...@gmail.com] it's your code line :) 


was (Author: janhoy):
Tested the patch, and it passes precommit. Here's the new error text from the 
API when attempting a backup across two nodes that do NOT share the backup 
drive. The same error will also be logged in the logs on both nodes. Will 
commit soon.
{noformat}
{
"responseHeader": {
"status": 500, "QTime": 135
}, "failure": {
"10.5.0.5:8983_solr": 
"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error 
from server at http://10.5.0.5:8983/solr: Failed to backup 
core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: file:///back/myback2. Backup 
folder must already exist. Note also that Backup/Restore of a SolrCloud 
collection requires a shared file system mounted at the same path on all nodes!"
}, "Operation backup caused exception:": 
"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not backup all replicas", "exception": {
"msg": "Could not backup all replicas", "rspCode": 500
}, "error": {
"metadata": [
"error-class", "org.apache.solr.common.SolrException", "root-error-class", 
"org.apache.solr.common.SolrException"
], "msg": "Could not backup all replicas", "trace": 
"org.apache.solr.common.SolrException: Could not backup all replicas\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
 
[...snip...]
}
}{noformat}
One unrelated observation here. Part of the error response says: *Could not 
backup all replicas*. While we might call any core a "replica", it would 
perhaps in the context of a collection backup be more precise to say *Could not 
backup all shards*?

> Confusing error reporting if backup attempted on non-shared FS
> --------------------------------------------------------------
>
>                 Key: SOLR-12523
>                 URL: https://issues.apache.org/jira/browse/SOLR-12523
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 7.3.1
>            Reporter: Timothy Potter
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-12523.patch
>
>
> So I have a large collection with 4 shards across 2 nodes. When I try to back 
> it up with:
> {code}
> curl 
> "http://localhost:8984/solr/admin/collections?action=BACKUP&name=sigs&collection=foo_signals&async=5&location=backups";
> {code}
> I either get:
> {code}
> "5170256188349065":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
>   "5170256187999044":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard3_replica_n10 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
> {code}
> or if I create the directory, then I get:
> {code}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":2},
>   "Operation backup caused 
> exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>  The backup directory already exists: file:///vol1/cloud84/backups/sigs/",
>   "exception":{
>     "msg":"The backup directory already exists: 
> file:///vol1/cloud84/backups/sigs/",
>     "rspCode":400},
>   "status":{
>     "state":"failed",
>     "msg":"found [2] in failed tasks"}}
> {code}
> I'm thinking this has to do with having 2 cores from the same collection on 
> the same node but I can't get a collection with 1 shard on each node to work 
> either:
> {code}
> "ec2-52-90-245-38.compute-1.amazonaws.com:8984_solr":"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
>  from server at http://ec2-52-90-245-38.compute-1.amazonaws.com:8984/solr: 
> Failed to backup core=system_jobs_history_shard2_replica_n6 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/ugh1"}
> {code}
> What's weird is that replica (system_jobs_history_shard2_replica_n6) is not 
> even on the ec2-52-90-245-38.compute-1.amazonaws.com node! It lives on a 
> different node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to