Jason, I haven't done much scalability testing, so it's hard to give
accurate numbers on when we start having issues.
For the environment I looked in detail we run a 16 nodes cluster, and the
collection I wasn't able to backup has about 1500 shards, ~1.5 GB each.

Core backups/restores are expensive calls, compared to other admin calls
like an empty core creation. I'm not sure of the full list of expensive and
non-expense operations, but we may also have a fairness issue when
expensive operations block the non expensive ones for a while.

I had a great discussion with David.
I see two options for a long term fix:

1. Throttle core backups/restore at the top level (in the overseer).
My approach was to not send too many concurrent requests from BackupCmd.
I have a decent POC for backups, but it should be refactored to share this
mechanism for all admin operations.
It will be harder to achieve in distributed mode (no overseer), since we'll
need a central place to count how many backups are in-flight. We may
somehow lock in Zookeeper for this, so it may be over complex at the end.


2. Throttle in each node.
- Currently, all async admin operations are handled by a ThreadPoolExecutor
defined in CoreAdminHandler class. This pool is hardcoded with a size of
50, so if we receive more than 50 concurrent tasks, we add them in the
queue of the executor. For a large collection backup, each node immediately
starts 50 concurrent core snapshots.
- Instead of immediately submitting to the executor, I think we should
manage our own queue here for expensive operations. By counting the number
of in-flight backups/restores, we don't submit to the executor more than
(lets say) 10 concurrent backups. Each time one is complete, we poll the
queue and start the next one if appropriate.
- This ensures fairness for expensive and non expensive operations.
Non-expensive ones will always have at least 40 threads to be quickly
handled. And this works well in distributed mode since the overseer is not
involved.
- This could be extended to define more than one queue, but I'm not sure it
is worth it.

Pierre


Le mar. 27 juin 2023 à 19:16, David Smiley <dsmi...@apache.org> a écrit :

> Here's a POC: https://github.com/apache/solr/pull/1729
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Jun 19, 2023 at 3:36 PM David Smiley <dsmi...@apache.org> wrote:
>
> > Has anyone mitigated the potentially large IO impact of doing a backup of
> > a large collection or just in general?  If the collection is large
> enough,
> > there very well could be many shards on one host and it could saturate
> the
> > IO.  I wonder if there should be a rate limit mechanism or some other
> > mechanism.
> >
> > Not the same but I know that at a segment level, the merges are rate
> > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > adjusts itself automatically ("ioThrottle" boolean).
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
>

Reply via email to