[
https://issues.apache.org/jira/browse/SOLR-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813066#comment-17813066
]
Alexey Serba commented on SOLR-16879:
-------------------------------------
I think this feature introduced a regression that you can not backup
collections with more than 10 shards as thread pool is rejecting new tasks:
{noformat}
SolrException: Could not backup all shards
Task
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda...
rejected from
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor[Running,
pool size = 5, active threads = 5, queued tasks = 5, completed tasks = 30]"
{noformat}
{{expensiveExecutor}} thread pool executor is
[created|https://github.com/apache/solr/blob/releases/solr/9.4.1/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L170-L174]
with 5 max threads and bounded queue of the same size (5), so the total number
of tasks is limited to 10 and all the other tasks are immediately rejected.
> Throttle concurrent backups/restores per node
> ---------------------------------------------
>
> Key: SOLR-16879
> URL: https://issues.apache.org/jira/browse/SOLR-16879
> Project: Solr
> Issue Type: Improvement
> Components: Backup/Restore
> Affects Versions: 9.2.1
> Reporter: Pierre Salagnac
> Priority: Minor
> Time Spent: 3h
> Remaining Estimate: 0h
>
> If the collection is large enough, there very well could be many shards on
> one host and it could saturate the IO. Same issue if we backup many
> collections concurrently.
> We should have a protection mechanism, so a Solr node does not have transient
> failures during a large backup or restore.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]