[
https://issues.apache.org/jira/browse/SOLR-17821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091368#comment-18091368
]
David Smiley commented on SOLR-17821:
-------------------------------------
I'm seeing a fresh test failure from this on branch_10x, ran on crave.
Develocity report is clean though.
{{./gradlew :solr:modules:gcs-repository:test --tests
"org.apache.solr.gcs.GCSIncrementalBackupTest.testSkipConfigset"
"-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC
-XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m"
-Ptests.seed=98FD12CCFD59AE01 -Ptests.timeoutSuite=600000!
-Ptests.useSecurityManager=true -Ptests.file.encoding=ISO-8859-1}}
{noformat}
2> java.lang.UnsupportedOperationException
2> at
org.apache.solr.cloud.api.collections.AbstractIncrementalBackupTest$ErrorThrowingTrackingBackupRepository.copyIndexFileTo(AbstractIncrementalBackupTest.java:609)
2> at
org.apache.solr.handler.RestoreCore$ShardBackupIdRestoreRepository.repoCopy(RestoreCore.java:362)
2> at
org.apache.solr.handler.RestoreCore.lambda$doRestore$1(RestoreCore.java:175)
2> at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
2> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
2> at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:408)
2> at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
2> at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
2> at java.base/java.lang.Thread.run(Thread.java:1583)
{noformat}
> InstallShardData and Recover do not handle failures gracefully
> --------------------------------------------------------------
>
> Key: SOLR-17821
> URL: https://issues.apache.org/jira/browse/SOLR-17821
> Project: Solr
> Issue Type: Bug
> Components: Backup/Restore
> Reporter: Houston Putman
> Assignee: Houston Putman
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Whenever a ShardInstall or Recover command succeeds, the shard zk terms will
> only be updated to reflect that they are not zero anymore. This is actually
> handled down in the InstallCoreData cmd, so if 1 core recover/install
> succeeds, then the zk terms will all be either untouched (if the terms are
> non-zero to start) or will all be set to 1. This does not handle errors
> gracefully.
> What we actually want to do is increase the terms of the successful replicas,
> and then the non-successful replicas can start to recover from the successful
> ones. If the leader was unsuccessful, it should give up leadership because
> its shard term is no longer the highest.
> Since shardInstall requires collections be read-only, we also need to fix the
> issues with read-only and recovery.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]