I was able to reproduce with the commands in the script here
https://paste.apache.org/o4uta.  They should work from within any
clean 8.9.0 checkout, if anyone else is interested in reproducing.

After reproducing I noticed some errors in the logs that probably
point to the real root cause here:

Caused by: java.lang.IllegalArgumentException: Unable to parse invalid
ShardBackupId: md_shard2_0_0
  at org.apache.solr.core.backup.ShardBackupId.from(ShardBackupId.java:59)
  at 
org.apache.solr.handler.admin.BackupCoreOp.parseShardBackupId(BackupCoreOp.java:99)
  at org.apache.solr.handler.admin.BackupCoreOp.execute(BackupCoreOp.java:44)
  at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
  ... 43 more

It looks like the "ShardBackupId" parsing code is too brittle to
handle the shard name that results from a splitshard.  I've filed
SOLR-15696 for this since it's a clear bug.

(That said - Jordan - feel free to create and close a "test" JIRA if
you'd still like to validate that you have the right permissions to do
so.)

On Fri, Oct 15, 2021 at 1:49 PM Jason Gerlowski <[email protected]> wrote:
>
> One last thought for now:
>
> One additional workaround option to try would be to change the
> 'location' path or 'name' parameters provided to Solr at backup time
> after any change in the number of shards.  If the post-splitshard
> backups are stored in a subdirectory location or under a different
> name, I suspect you'd avoid whatever missing file weirdness you've run
> into so far.
>
> On Fri, Oct 15, 2021 at 1:45 PM Jason Gerlowski <[email protected]> wrote:
> >
> > Hi Jordan,
> >
> > Sorry you're running into problems with incremental backups (and with 
> > JIRA!).
> >
> > I suspect you just ran into a temporary problem with JIRA - as long as
> > you have an account and are logged in you should 100% be able to file
> > JIRA tickets the way you described.  Please give it another shot and
> > let me know if the issue still persists.  If you're still unable to
> > create the ticket I'm happy to do so on your behalf.
> >
> > In terms of the actual behavior you're seeing, I know that splitshards
> > can cause hiccups in backup/restore workflows, but I would expect
> > those to happen primarily at restore-time. (A change in the number of
> > shards effectively prevents any previously backed up data from being
> > restored to the original collection.    A 'N' shard backup can't be
> > restored to a collection with 'N+1' shards.)
> >
> > So at first glance your report above sounds like a bug.  That said,
> > I'm returning from an extended leave and am pretty rusty on some of
> > the specifics here.  I'll work on reproducing this myself and try to
> > figure out if this is a real problem or some other "known" limitation
> > that I'd forgotten about.
> >
> > Best,
> >
> > Jason
> >
> > On Thu, Sep 30, 2021 at 12:34 PM Jordan Diehl
> > <[email protected]> wrote:
> > >
> > > Hello,
> > > I just tried opening a Jira ticket for an issue I was seeing, but after 
> > > filling out all the info and hitting create it didn't work. Now any time 
> > > I click create I get an error message saying "The Jira server could not 
> > > be contacted. This may be a temporary glitch or the server may be down.". 
> > > I also tried this on 3 other computers, but they all hit the same issue. 
> > > Once they try to create the bug they are permanently blocked from bug 
> > > creation. I'm not sure what to do at this point, so this email seems to 
> > > be my last chance to submit this bug. I would really appreciate if 
> > > someone could either create this bug, or if there is a known issue with 
> > > Jira right now, then let me know what that issue is and how I should 
> > > proceed.
> > >
> > >
> > > Bug Info
> > > Summary: Incremental backup attempts fail after a shard split operation 
> > > has completed
> > > Component: Backup/Restore
> > > Affects Version: 8.9
> > > Description:
> > > I have been attempting to use the incremental backup API on Solr 8.9.0, 
> > > but while testing in our product we would occasionally get into a state 
> > > where all subsequent backup attempts would fail. After some triage we 
> > > found that it was happening to any collection which had undergone a shard 
> > > split operation. If we did a backup, completed a shard split operation, 
> > > then attempted another backup, the second backup would fail with a 
> > > FileNotFound exception relating to the backup id of the second backup as 
> > > the error message.
> > >
> > > Steps to reproduce:
> > >
> > > Create a new collection with no associated backups
> > >
> > > Run a backup for this collection
> > >
> > > /admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
> > >
> > > Run a shard split operation
> > >
> > > /admin/collections?action=SPLITSHARD&collection=name&shard=shardID
> > >
> > > Attempt another backup
> > >
> > >
> > > Expected Outcome:
> > >
> > > * If this operation is being blocked intentionally, then I would expect 
> > > an informative error message explaining why it failed. Otherwise I would 
> > > expect the backup to complete successfully.
> > >
> > >
> > > Actual Outcome:
> > >
> > > * The backup operation fails with a NoSuchFileException.
> > >
> > > NOTE: In the below exception message the number in the file which isn’t 
> > > found (in this case zk_backup_1) relates to the backup attempt which is 
> > > currently being attempted.
> > >
> > > {
> > >
> > >   "responseHeader":{
> > >
> > >     "status":500,
> > >
> > >     "QTime":54},
> > >
> > >   "failure":{
> > >
> > >     
> > > "MYIPADDRESS:31018_solr":"org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:Error
> > >  from server at null: Error handling 'BACKUPCORE' action"},
> > >
> > >   "Operation backup caused 
> > > exception:":"java.nio.file.NoSuchFileException:java.nio.file.NoSuchFileException:
> > >  /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
> > >
> > >   "exception":{
> > >
> > >     
> > > "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
> > >
> > >     "rspCode":-1},
> > >
> > >   "error":{
> > >
> > >     "metadata":[
> > >
> > >       "error-class","org.apache.solr.common.SolrException",
> > >
> > >       "root-error-class","org.apache.solr.common.SolrException"],
> > >
> > >     
> > > "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
> > >
> > >     "trace":"org.apache.solr.common.SolrException: 
> > > /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1\n\tat
> > >  
> > > org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
> > >  
> > > org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:301)\n\tat
> > >  
> > > org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
> > >  
> > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)\n\tat
> > >  
> > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)\n\tat
> > >  
> > > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)\n\tat
> > >  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)\n\tat 
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat
> > >  
> > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)\n\tat
> > >  
> > > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)\n\tat
> > >  
> > > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat
> > >  
> > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> > >  
> > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
> > >  
> > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
> > >  
> > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat
> > >  
> > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> > >  
> > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
> > >  
> > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
> > >  org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat 
> > > org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)\n\tat
> > >  
> > > org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)\n\tat 
> > > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)\n\tat 
> > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat
> > >  
> > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
> > >  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat 
> > > org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)\n\tat
> > >  
> > > org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)\n\tat
> > >  java.lang.Thread.run(Thread.java:748)\n",
> > >
> > >     "code":500}}
> > >
> > >
> > > I tried a few different workaround attempts, but after going through 
> > > these steps I wasn’t able to run another backup for the collection.
> > >
> > >
> > > Workaround attempt 1:
> > >
> > > Use the API to delete the backup
> > >
> > > Used the API to purge unused backup files
> > >
> > > Restarted Solr
> > >
> > > Attempted another backup
> > >
> > > Encountered the same failure
> > >
> > >
> > > Workaround attempt 2:
> > >
> > > Deleted all files in my Solr backup mount location
> > >
> > > Restarted Solr
> > >
> > > Attempted another backup
> > >
> > > Encountered the same failure
> > >
> > >
> > > Thanks for your time,
> > >
> > > Jordan Diehl
> > >
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to