In a number of circumstances, CloudSolrClient and various parts of Solr (distributed search, distributed indexing), will create a request routed to a specific core in SolrCloud. But the routing is almost always done plainly, with the URL path like /solr/foo/handler and SolrCloud (specifically HttpSolrCall) doesn't know if "foo" is a core or a collection, so it tries both. Sometimes, the core once existed but doesn't any longer due to replica rebalancing activities. The 404 response code is rather sad. Depending on who the caller is and whether the request had a payload (e.g. indexing), it may or may not know how to retry with an updated ClusterState or even know if its ClusterState is stale. Payloads are not retry-able. If the request somehow had clarity on the intended shard, at least, Solr could then handle it locally or proxy it to a suitable node, and use response headers containing a hint to the caller that it might want to get a new ClusterState.
A partial fix is for such requests to always add the "collection" parameter when routing to a core. However, it's only suitable when any core of the collection is a reasonable substitute if the preferred/original core doesn't resolve. That'd work for indexing since it routes by payload content, but not distributed-search (isShard=true) that demands a particular shard. I'm not a fan of the choice of the very existence of the "collection" parameter either[1]. I strongly think important routing information, particularly the collection you are talking to (!), should be in the path. A naively written proxy might have a security issue if its developers didn't know that a request to a collection can be pointed at another that wasn't intended to be accessible. I'd rather see a more holistic elegant refactoring instead of adding another parameter. Here's a straw-man proposal that uses URL matrix parameters to parameterize the routing before/separate from query parameters. I'll show some examples (assume SolrCloud mode) Existing scenarios: /solr/collectionName/handlerName /solr/aliasName/handlerName /solr/collection1,collection2,collection3/handlerName /solr/coreName/handlerName (would like this to go away in SolrCloud) New scenarios: /solr/collectionName;s=shardName/handlerName /solr/collectionName;s=shardName;r=replicaName/handlerName /solr/collectionName;s=shardName;leader=true/handlerName If matrix parameters are present (presence of a semicolon), SolrCloud can know collectionName is a collection name (and not an alias or a core). "s" means shard name, "r" means replica name (which might rarely be used[2]). The single-char choices are the same as used in our logging pattern for MDC. "leader=true" for the leader of course. Matrix parameters are extensible; we might see fit to add "x" for the core name or other parameters similar to that of shards.preference[3] Any thoughts on this? Java variable name parameters might use the term "collSpec" or something to indicate that the input isn't necessarily a collection. [1] "collection" param was added as part of SOLR-4497 for Collection Aliasing but it wasn't necessary. Years later when aliasing was improved (by me), the path component supported a comma delimited list. But "collection" should probably have been deprecated. If you think not; what am I missing? [2] Specifying the replica *on a specific node* is probably always redundant since there is very likely exactly one or zero replicas for the shard. If there's more than one, either will do (they are replicas). It could be interesting if the client could detect the redundancy and then be more specific only then but that's probably unnecessary. I bet tests overload replicas per shard on a node, however. [3] https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#shards-preference-parameter ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley