In a number of circumstances, CloudSolrClient and various parts of Solr
(distributed search, distributed indexing), will create a request routed to
a specific core in SolrCloud.  But the routing is almost always done
plainly, with the URL path like /solr/foo/handler and SolrCloud
(specifically HttpSolrCall) doesn't know if "foo" is a core or a
collection, so it tries both.  Sometimes, the core once existed but doesn't
any longer due to replica rebalancing activities.  The 404 response code is
rather sad.  Depending on who the caller is and whether the request had a
payload (e.g. indexing), it may or may not know how to retry with an
updated ClusterState or even know if its ClusterState is stale.  Payloads
are not retry-able.  If the request somehow had clarity on the intended
shard, at least, Solr could then handle it locally or proxy it to a
suitable node, and use response headers containing a hint to the caller
that it might want to get a new ClusterState.

A partial fix is for such requests to always add the "collection" parameter
when routing to a core.  However, it's only suitable when any core of the
collection is a reasonable substitute if the preferred/original core
doesn't resolve.  That'd work for indexing since it routes by payload
content, but not distributed-search (isShard=true) that demands a
particular shard.

I'm not a fan of the choice of the very existence of the "collection"
parameter either[1].  I strongly think important routing information,
particularly the collection you are talking to (!), should be in the path.
A naively written proxy might have a security issue if its developers
didn't know that a request to a collection can be pointed at another that
wasn't intended to be accessible.

I'd rather see a more holistic elegant refactoring instead of adding
another parameter.  Here's a straw-man proposal that uses URL matrix
parameters to parameterize the routing before/separate from query
parameters.  I'll show some examples (assume SolrCloud mode)

  Existing scenarios:
/solr/collectionName/handlerName
/solr/aliasName/handlerName
/solr/collection1,collection2,collection3/handlerName
/solr/coreName/handlerName  (would like this to go away in SolrCloud)
  New scenarios:
/solr/collectionName;s=shardName/handlerName
/solr/collectionName;s=shardName;r=replicaName/handlerName
/solr/collectionName;s=shardName;leader=true/handlerName

If matrix parameters are present (presence of a semicolon), SolrCloud can
know collectionName is a collection name (and not an alias or a core).  "s"
means shard name, "r" means replica name (which might rarely be used[2]).
The single-char choices are the same as used in our logging pattern for
MDC.  "leader=true" for the leader of course.  Matrix parameters are
extensible; we might see fit to add "x" for the core name or other
parameters similar to that of shards.preference[3]

Any thoughts on this?

Java variable name parameters might use the term "collSpec" or something to
indicate that the input isn't necessarily a collection.

[1] "collection" param was added as part of SOLR-4497 for Collection
Aliasing but it wasn't necessary.  Years later when aliasing was improved
(by me), the path component supported a comma delimited list.  But
"collection" should probably have been deprecated.  If you think not; what
am I missing?
[2] Specifying the replica *on a specific node* is probably always
redundant since there is very likely exactly one or zero replicas for the
shard.  If there's more than one, either will do (they are replicas).  It
could be interesting if the client could detect the redundancy and then be
more specific only then but that's probably unnecessary.  I bet tests
overload replicas per shard on a node, however.
[3]
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#shards-preference-parameter

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

Reply via email to