Hi all

I have a cloud setup with 8 nodes and 3 collections, products, items
and skus. All collections have just one shard, products has 6
replicas, items has 2 replicas, skus has 8 replicas. No node has both
products and items, all nodes have skus

Some of our queries join from sku to either products or items. If the
query is directed at a node without the appropriate shard on them, we
obviously get an error, so we have separate balancers for products and
items.

The problem occurs when we attempt to query a node to see if products
or items is active on that node. The balancer (haproxy) requests the
ping handler for the appropriate collection, however all the nodes
return OK for all the collections(!)

Eg, on node01, it has replicas for products and skus, but the ping
handler for /solr/items/admin/ping returns 200!

This means that as far as the balancer is concerned, node01 is a valid
destination for item queries, and inevitably it blows up as soon as
such a query is made to it.

As I understand it, this is because the URL we are checking is for the
collection ("items") rather than a specific core
("items_shard1_replica1")

Is there a way to make the ping handler only check local shards? I
have tried with distrib=false&preferLocalShards=false, but it still
returns a 200.

The option I'm trying now is to make two ping handler for skus that
join to one of items/products, which should fail on the servers which
do not support it, but I am concerned that this is a little
heavyweight for a status check to see whether we can direct requests
at this server or not.

Cheers

Tom

Reply via email to