[jira] [Commented] (SOLR-17821) InstallShardData and Recover do not handle failures gracefully

David Smiley (Jira) Wed, 24 Jun 2026 19:38:11 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091368#comment-18091368
 ]


David Smiley commented on SOLR-17821:
-------------------------------------

I'm seeing a fresh test failure from this on branch_10x, ran on crave.  
Develocity report is clean though.


{{./gradlew :solr:modules:gcs-repository:test --tests 
"org.apache.solr.gcs.GCSIncrementalBackupTest.testSkipConfigset" 
"-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC 
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" 
-Ptests.seed=98FD12CCFD59AE01 -Ptests.timeoutSuite=600000! 
-Ptests.useSecurityManager=true -Ptests.file.encoding=ISO-8859-1}}
{noformat}
  2> java.lang.UnsupportedOperationException
  2>    at 
org.apache.solr.cloud.api.collections.AbstractIncrementalBackupTest$ErrorThrowingTrackingBackupRepository.copyIndexFileTo(AbstractIncrementalBackupTest.java:609)
  2>    at 
org.apache.solr.handler.RestoreCore$ShardBackupIdRestoreRepository.repoCopy(RestoreCore.java:362)
  2>    at 
org.apache.solr.handler.RestoreCore.lambda$doRestore$1(RestoreCore.java:175)
  2>    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
  2>    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
  2>    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:408)
  2>    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  2>    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  2>    at java.base/java.lang.Thread.run(Thread.java:1583)
{noformat}

> InstallShardData and Recover do not handle failures gracefully
> --------------------------------------------------------------
>
>                 Key: SOLR-17821
>                 URL: https://issues.apache.org/jira/browse/SOLR-17821
>             Project: Solr
>          Issue Type: Bug
>          Components: Backup/Restore
>            Reporter: Houston Putman
>            Assignee: Houston Putman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.1
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Whenever a ShardInstall or Recover command succeeds, the shard zk terms will 
> only be updated to reflect that they are not zero anymore. This is actually 
> handled down in the InstallCoreData cmd, so if 1 core recover/install 
> succeeds, then the zk terms will all be either untouched (if the terms are 
> non-zero to start) or will all be set to 1. This does not handle errors 
> gracefully.
> What we actually want to do is increase the terms of the successful replicas, 
> and then the non-successful replicas can start to recover from the successful 
> ones. If the leader was unsuccessful, it should give up leadership because 
> its shard term is no longer the highest.
> Since shardInstall requires collections be read-only, we also need to fix the 
> issues with read-only and recovery.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-17821) InstallShardData and Recover do not handle failures gracefully

Reply via email to