Thanks for starting this thread David. I've been internally working on this, since we have issues (query failures) during backups of big collections because of IO saturation.
I see two different approaches to solve this: 1. Throttle at the IO level, like David mentioned. 2. Limit the number of cores we backup concurrently. (These two options are *not* mutually exclusive.) I've been focused on the second option, to limit the number of concurrent backups per node. Currently, the overseer sends shard requests to all shards in a simple 'for' loop. If the collection has one thousand shards, we'll start 1 thousand concurrent backups. The idea is to only send shard level requests up to a certain limit per node, and then each time a shard is complete, we send the next one for this node. If you're interested, I integrated my experiment (for non incremental backups) here: https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5 I don't think backup is the only operation that should be considered. At least restore is, not sure whether we have other IO intensive operations that are at the collection level. Ideally, we should have something generic and not consider each type of operation individually. Thanks Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya < ichattopadhy...@gmail.com> a écrit : > Might be a good question for users@ list, I guess. I'm sure other users > must've thought about this. > Cross posting there, as I'm curious myself too. > > On Tue, 20 Jun 2023 at 01:07, David Smiley <dsmi...@apache.org> wrote: > > > Has anyone mitigated the potentially large IO impact of doing a backup > of a > > large collection or just in general? If the collection is large enough, > > there very well could be many shards on one host and it could saturate > the > > IO. I wonder if there should be a rate limit mechanism or some other > > mechanism. > > > > Not the same but I know that at a segment level, the merges are rate > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but > > adjusts itself automatically ("ioThrottle" boolean). > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > http://www.linkedin.com/in/davidwsmiley > > >