On Wed, Mar 16, 2016 at 4:10 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 3/16/2016 8:14 AM, Tom Evans wrote:
>> The problem occurs when we attempt to query a node to see if products
>> or items is active on that node. The balancer (haproxy) requests the
>> ping handler for the appropriate collection, however all the nodes
>> return OK for all the collections(!)
>>
>> Eg, on node01, it has replicas for products and skus, but the ping
>> handler for /solr/items/admin/ping returns 200!
>
> This returns OK because as long as one replica for every shard in
> "items" is available somewhere in the cloud, you can make a request for
> "items" on that node and it will work.  Or at least it *should* work,
> and if it's not working, that's a bug.  I remember that one of the older
> 4.x versions *did* have a bug where queries for a collection would only
> work if the node actually contained shards for that collection.

Sorry, this is Solr 5.5, I should have said.

Yes, we can absolutely make a request of "items", and it will work
correctly. However, we are making requests of "skus" that join to
"products", and the query is routed to a node which has only "skus"
and "items", and the request fails because joins can only work over
local replicas.

To fix this, we now have two additional balancers:

solr: has all the nodes, all nodes are valid backends
solr-items: has all the nodes in the cluster, but nodes are only valid
backends if it has "items" and "skus" replicas.
solr-products: has all the nodes in the cluster, but nodes are only
valid backends if it has "products" and "skus" replicas

(I'm simplifying things a bit, there are another 6 collections that
are on all nodes, hence the main balancer.)

The new balancers need a cheap way of checking what nodes are valid,
and ideally I'd like that check to not involve a query with a join
clause!

Cheers

Tom

Reply via email to