Getting to grips with auto-scaling

Tom Evans Fri, 05 Jun 2020 12:00:13 -0700

Hi

I'm trying to get a handle on the newer auto-scaling features in Solr.
We're in the process of upgrading an older SolrCloud cluster from 5.5
to 8.5, and re-architecture it slightly to improve performance and
automate operations.


If I boil it down slightly, currently we have two collections, "items"
and "lists". Both collections have just one shard. We publish new data
to "items" once each day, and our users search and do analysis on
them, whilst "lists" contains NRT user-specified collections of ids
from items, which we join to from "items" in order to allow them to
restrict their searches/analysis to just docs in their curated lists.

Most of our searches have specific date ranges in them, usually only
from the last 3 years or so, but sometimes we need to do searches
across all the data. With the new setup, we want to:

* shard by date (year) to make the hottest data available in smaller shards
* have more nodes with these shards than we do of the older data.
* be able to add/remove nodes predictably based upon our clients
(predictable) query load
* use TLOG for "items" and NRT for "lists", to avoid unnecessary
indexing load for "items" and have NRT for "lists".
* spread cores across two AZ

With that in mind, I came up with a bunch of simplified rules for
testing, with just 4 shards for "items":

* "lists" collection has one NRT replica on each node
* "items" collection shard 2020 has one TLOG replica on each node
* "items" collection shard 2019 has one TLOG replica on 75% of nodes
* "items" collection shards 2018 and 2017 each have one TLOG replica
on 50% of nodes
* all shards have at least 2 replicas if number of nodes > 1
* no node should have 2 replicas of the same shard
* number of cores should be balanced across nodes

Eg, with 1 node, I want to see this topology:
A: items: 2020, 2019, 2018, 2017 + lists

with 2 nodes:
A: items: 2020, 2019, 2018, 2017 + lists
B: items: 2020, 2019, 2018, 2017 + lists

and if I add two more nodes:
A: items: 2020, 2019, 2018 + lists
B: items: 2020, 2019, 2017 + lists
C: items: 2020, 2019, 2017 + lists
D: items: 2020, 2018 + lists

To the questions:

* The type of replica created when nodeAdded is triggered can't be set
per collection. Either everything gets NRT or everything gets TLOG.
Even if I specify nrtReplicas=0 when creating a collection, nodeAdded
will add NRT replicas if configured that way.
* I'm having difficulty expressing these rules in terms of a policy -
I can't seem to figure out a way to specify the number of replicas for
a shard based upon the total number of nodes.
* Is this beyond the current scope of autoscaling triggers/policies?
Should I instead use the trigger with a custom plugin action (or to
trigger a web hook) to be a bit more intelligent?
* Am I wasting my time trying to ensure there are more replicas of the
hotter shards than the colder shards? It seems to add a lot of
complexity - should I just instead think that they aren't getting
queried much, so won't be using up cache space that the hot shards
will be using. Disk space is pretty cheap after all (total size for
"items" + "lists" is under 60GB).

Cheers

Tom

Getting to grips with auto-scaling

Reply via email to