Hi I'm trying to get a handle on the newer auto-scaling features in Solr. We're in the process of upgrading an older SolrCloud cluster from 5.5 to 8.5, and re-architecture it slightly to improve performance and automate operations.
If I boil it down slightly, currently we have two collections, "items" and "lists". Both collections have just one shard. We publish new data to "items" once each day, and our users search and do analysis on them, whilst "lists" contains NRT user-specified collections of ids from items, which we join to from "items" in order to allow them to restrict their searches/analysis to just docs in their curated lists. Most of our searches have specific date ranges in them, usually only from the last 3 years or so, but sometimes we need to do searches across all the data. With the new setup, we want to: * shard by date (year) to make the hottest data available in smaller shards * have more nodes with these shards than we do of the older data. * be able to add/remove nodes predictably based upon our clients (predictable) query load * use TLOG for "items" and NRT for "lists", to avoid unnecessary indexing load for "items" and have NRT for "lists". * spread cores across two AZ With that in mind, I came up with a bunch of simplified rules for testing, with just 4 shards for "items": * "lists" collection has one NRT replica on each node * "items" collection shard 2020 has one TLOG replica on each node * "items" collection shard 2019 has one TLOG replica on 75% of nodes * "items" collection shards 2018 and 2017 each have one TLOG replica on 50% of nodes * all shards have at least 2 replicas if number of nodes > 1 * no node should have 2 replicas of the same shard * number of cores should be balanced across nodes Eg, with 1 node, I want to see this topology: A: items: 2020, 2019, 2018, 2017 + lists with 2 nodes: A: items: 2020, 2019, 2018, 2017 + lists B: items: 2020, 2019, 2018, 2017 + lists and if I add two more nodes: A: items: 2020, 2019, 2018 + lists B: items: 2020, 2019, 2017 + lists C: items: 2020, 2019, 2017 + lists D: items: 2020, 2018 + lists To the questions: * The type of replica created when nodeAdded is triggered can't be set per collection. Either everything gets NRT or everything gets TLOG. Even if I specify nrtReplicas=0 when creating a collection, nodeAdded will add NRT replicas if configured that way. * I'm having difficulty expressing these rules in terms of a policy - I can't seem to figure out a way to specify the number of replicas for a shard based upon the total number of nodes. * Is this beyond the current scope of autoscaling triggers/policies? Should I instead use the trigger with a custom plugin action (or to trigger a web hook) to be a bit more intelligent? * Am I wasting my time trying to ensure there are more replicas of the hotter shards than the colder shards? It seems to add a lot of complexity - should I just instead think that they aren't getting queried much, so won't be using up cache space that the hot shards will be using. Disk space is pretty cheap after all (total size for "items" + "lists" is under 60GB). Cheers Tom