On Wed, Mar 16, 2016 at 4:10 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 3/16/2016 8:14 AM, Tom Evans wrote: >> The problem occurs when we attempt to query a node to see if products >> or items is active on that node. The balancer (haproxy) requests the >> ping handler for the appropriate collection, however all the nodes >> return OK for all the collections(!) >> >> Eg, on node01, it has replicas for products and skus, but the ping >> handler for /solr/items/admin/ping returns 200! > > This returns OK because as long as one replica for every shard in > "items" is available somewhere in the cloud, you can make a request for > "items" on that node and it will work. Or at least it *should* work, > and if it's not working, that's a bug. I remember that one of the older > 4.x versions *did* have a bug where queries for a collection would only > work if the node actually contained shards for that collection.
Sorry, this is Solr 5.5, I should have said. Yes, we can absolutely make a request of "items", and it will work correctly. However, we are making requests of "skus" that join to "products", and the query is routed to a node which has only "skus" and "items", and the request fails because joins can only work over local replicas. To fix this, we now have two additional balancers: solr: has all the nodes, all nodes are valid backends solr-items: has all the nodes in the cluster, but nodes are only valid backends if it has "items" and "skus" replicas. solr-products: has all the nodes in the cluster, but nodes are only valid backends if it has "products" and "skus" replicas (I'm simplifying things a bit, there are another 6 collections that are on all nodes, hence the main balancer.) The new balancers need a cheap way of checking what nodes are valid, and ideally I'd like that check to not involve a query with a join clause! Cheers Tom