Jason, I haven't done much scalability testing, so it's hard to give accurate numbers on when we start having issues. For the environment I looked in detail we run a 16 nodes cluster, and the collection I wasn't able to backup has about 1500 shards, ~1.5 GB each.
Core backups/restores are expensive calls, compared to other admin calls like an empty core creation. I'm not sure of the full list of expensive and non-expense operations, but we may also have a fairness issue when expensive operations block the non expensive ones for a while. I had a great discussion with David. I see two options for a long term fix: 1. Throttle core backups/restore at the top level (in the overseer). My approach was to not send too many concurrent requests from BackupCmd. I have a decent POC for backups, but it should be refactored to share this mechanism for all admin operations. It will be harder to achieve in distributed mode (no overseer), since we'll need a central place to count how many backups are in-flight. We may somehow lock in Zookeeper for this, so it may be over complex at the end. 2. Throttle in each node. - Currently, all async admin operations are handled by a ThreadPoolExecutor defined in CoreAdminHandler class. This pool is hardcoded with a size of 50, so if we receive more than 50 concurrent tasks, we add them in the queue of the executor. For a large collection backup, each node immediately starts 50 concurrent core snapshots. - Instead of immediately submitting to the executor, I think we should manage our own queue here for expensive operations. By counting the number of in-flight backups/restores, we don't submit to the executor more than (lets say) 10 concurrent backups. Each time one is complete, we poll the queue and start the next one if appropriate. - This ensures fairness for expensive and non expensive operations. Non-expensive ones will always have at least 40 threads to be quickly handled. And this works well in distributed mode since the overseer is not involved. - This could be extended to define more than one queue, but I'm not sure it is worth it. Pierre Le mar. 27 juin 2023 à 19:16, David Smiley <dsmi...@apache.org> a écrit : > Here's a POC: https://github.com/apache/solr/pull/1729 > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Jun 19, 2023 at 3:36 PM David Smiley <dsmi...@apache.org> wrote: > > > Has anyone mitigated the potentially large IO impact of doing a backup of > > a large collection or just in general? If the collection is large > enough, > > there very well could be many shards on one host and it could saturate > the > > IO. I wonder if there should be a rate limit mechanism or some other > > mechanism. > > > > Not the same but I know that at a segment level, the merges are rate > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but > > adjusts itself automatically ("ioThrottle" boolean). > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > >