Re: SolrCloud request routing URL structure

Pierre Salagnac Tue, 14 Jan 2025 05:20:35 -0800

I'm not a big fan of matrix parameters to identify the resource we want to
target.


To me, as a guideline, the path of the URL is to target a resource, which
in the Solr case is a collection/shard/core and a handler. Then parameters,
or the query part of the URL, give other details like the action we want to
take. Adding parameters in the path of the resource introduces a mix

I'd rather the URL path is always explicit on the resource we target. I
like approaches like:
- /solr/c/col1/select
- /solr/c/col1/s/shard1/select
- /solr/c/col1/r/core1/select
because there is no ambiguity at all. All names are strongly enforced to be
the one or the other resource type. That's indeed what v2 offers somehow.

The issue with v1 API is ambiguity, and I'm concerned matrix parameters
won't be sufficient to resolve ambiguities. In the case of
/solr/my-id;r=core1/select , we still don't know whether my-id is a shard
or a collection. Both are conceptually valid.
Or can matrix parameters be required? I think such parameters are in
essence optional, but I'm definitely not an expert here.


Le ven. 10 janv. 2025 à 15:11, Jason Gerlowski <[email protected]> a
écrit :

> Hi,
>
> Sorry for the late reply on this thread.  I've been trying to take
> time to understand the problem a little better before chiming in.
> (FWIW I think this would be a great discussion for the Meetup next
> week?)
>
> I think much of what you described makes sense in a v1 context, but v2
> has some nice building blocks for solving this, if not a complete
> solution already.
>
> Particularly the v2 paths avoid much of the ambiguity present in v1.
> e.g. "/solr/???/select" is replaced in v2 with
> "/collections/someCollName/select" and "/cores/someCoreName/select".
> There's not currently a "shard-level" path for querying, but other
> APIs currently exist at the
> "/collections/someColl/shards/someShardName" path, so it's only a
> small leap to offering querying there.  Together these paths could be
> used much as Hoss described in the "Long Term Strawman" section of his
> response.
>
> Of course, v2 completion remains pretty distant and we may want a
> solution in the interim.  Matrix-params and Hoss's "header hint" idea
> both seem like reasonable approaches in that regard, though I'd
> personally lean towards the header-based approach.
>
> Best,
>
> Jason
>
>
> On Thu, Jan 2, 2025 at 9:46 AM David Smiley <[email protected]> wrote:
> >
> > I'd like to move forward on this soon.
> >
> > A colleague expressed concern about matrix-params being obscure so
> perhaps
> > should be avoided because of that.  However, to me it seems their use is
> > well fitting to the problem.  Also, I estimate this path will have
> > relatively low impact on the codebase with respect to continuing to have
> > Replica.getXXXUrl methods that give a base URL (thus without params) The
> > URL (without params) will encode sufficient information that HttpSolrCall
> > needs.
> >
> > On Tue, Dec 10, 2024 at 5:36 PM Chris Hostetter <
> [email protected]>
> > wrote:
> >
> > >
> > > I don't disagree with any of your points.  If anything i think the
> problem
> > > is more egregious then you characterize it (especially in the case of
> > > requests for specific replicas that have been (re)moved -- IIRC not
> only
> > > does the Solr node return 404, but before doing that it forcibly
> refreshes
> > > the entire cluster state to se if there is a "new" collection it
> doesn't
> > > know about with that name)
> > >
> > > The one thing i think you may be overlooking is in your comment that
> you'd
> > > like to see requests to specific cores go away -- presumably because
> > > you feel like shard specificity is enough for sub-requests?  But
> > > being able to target a specific core with requests is kind of important
> > > for diagnosing bugs/discrepencies.  Even in a perfectly functioning
> > > system, features like shards.preference depend on being able to route a
> > > request to a specific replica on a node -- not just any replica of that
> > > shard (ie: prefer PULL replicas)
> > >
> > >
> > > I don't have any strong objections to your "matrixized" path param,
> but I
> > > would suggest two alternative strawmen:
> > >
> > > * Long Term Strawman *
> > >
> > > In a "Post V2 API" type world, it seems like what we should probably be
> > > doing is switching to a completley different path prefix(es) for
> requests
> > > targetting a specific shard/replica?
> > >
> > > We already have "/api/c/<collection-name>/<handler-name>" -- it seems
> like
> > > ideally /api/c/* should *require* that the next portion of the path be
> an
> > > actual collection name, and when sub-requests are made, or when clients
> > > want to route requests to specific replicas, those requests should go
> to
> > > some *new* paths (that don't have the baggage of resolving/proxying
> > > collection level requests)
> > >
> > > Perhaps
> > >  - /api/s/<collection-name>/<shard-name>/...
> > >    "any replica of <shard-name> available on this solr node"
> > >
> > >  - /api/r/<collection-name>/<replica-name>/...
> > >    "the specific replica <replica-name> if it's on this solr node"
> > >
> > >
> > >
> > > * Short Term / Backcompat Strawman *
> > >
> > > Would (optional) HTTP headers like "X-Solr-Collection",
> > > & "X-Solr-Replica" be easier to adopt then matrixizing
> > > the URL path?
> > >
> > > If those headers don't exist, then the existing logic can all still
> run.
> > >
> > > If those headers do exist, then solr can compare the values of those
> > > headers with the path info to help optimize away some of the existing
> "Is
> > > this path a collection name or a core name" type logic (and/or narrow
> down
> > > which shard to pick from if it is a collection name)
> > >
> > > I'm not suggesting that these headers would *override* the path, just
> > > serve as hints to reduce the "search space" in HttpSolrCall...
> > >
> > > Example #0
> > >
> > >  GET /solr/yak/select?...
> > >
> > >  * no hints what yak is
> > >  * all existing hueristics apply
> > >
> > > Example #1
> > >
> > >  GET /solr/foo/select?...
> > >  X-Solr-Replica: foo
> > >
> > >  * foo is expected to be the name of a specific (local) replica
> > >  * if a SolrCore named foo doesn't exist on the current node,
> > >    just return 404, don't bother looking for a collection named foo
> > >
> > > Example #2
> > >
> > >  GET /solr/bar/select?...
> > >  X-Solr-Collection: bar
> > >
> > >  * bar is expected to be the name of a collection
> > >  * if bar isn't a valid collection name, just return 404,
> > >    don't bother checking for a local SolrCore named bar
> > >
> > > Example #3
> > >
> > >  GET /solr/bar/select?...
> > >  X-Solr-Collection: bar
> > >  X-Solr-Replica: foo
> > >
> > >  * bar is expected to be the name of a collection
> > >  * if bar isn't a valid collection name, just return 404,
> > >    don't bother checking for a local SolrCore named bar
> > >  * foo is expected to be the name of a specific (local) replica
> > >    of the collection named bar
> > >  * if a SolrCore named foo doesn't exist on the current node *OR*
> > >    if a SolrCore named foo does exist, but isn't a replica of
> > >    collection bar, just return 404, don't bother picking an
> > >    arbitrary replica of collection bar
> > >
> > > Example #4
> > >
> > >  GET /solr/yak/select?...
> > >  X-Solr-Collection: bar
> > >  X-Solr-Replica: foo
> > >
> > >  * neither hint matches path, return 404
> > >
> > >
> > >
> > >
> > > : Date: Mon, 9 Dec 2024 23:42:45 -0500
> > > : From: David Smiley <[email protected]>
> > > : Reply-To: [email protected]
> > > : To: [email protected]
> > > : Subject: SolrCloud request routing URL structure
> > > :
> > > : In a number of circumstances, CloudSolrClient and various parts of
> Solr
> > > : (distributed search, distributed indexing), will create a request
> routed
> > > to
> > > : a specific core in SolrCloud.  But the routing is almost always done
> > > : plainly, with the URL path like /solr/foo/handler and SolrCloud
> > > : (specifically HttpSolrCall) doesn't know if "foo" is a core or a
> > > : collection, so it tries both.  Sometimes, the core once existed but
> > > doesn't
> > > : any longer due to replica rebalancing activities.  The 404 response
> code
> > > is
> > > : rather sad.  Depending on who the caller is and whether the request
> had a
> > > : payload (e.g. indexing), it may or may not know how to retry with an
> > > : updated ClusterState or even know if its ClusterState is stale.
> Payloads
> > > : are not retry-able.  If the request somehow had clarity on the
> intended
> > > : shard, at least, Solr could then handle it locally or proxy it to a
> > > : suitable node, and use response headers containing a hint to the
> caller
> > > : that it might want to get a new ClusterState.
> > > :
> > > : A partial fix is for such requests to always add the "collection"
> > > parameter
> > > : when routing to a core.  However, it's only suitable when any core
> of the
> > > : collection is a reasonable substitute if the preferred/original core
> > > : doesn't resolve.  That'd work for indexing since it routes by payload
> > > : content, but not distributed-search (isShard=true) that demands a
> > > : particular shard.
> > > :
> > > : I'm not a fan of the choice of the very existence of the "collection"
> > > : parameter either[1].  I strongly think important routing information,
> > > : particularly the collection you are talking to (!), should be in the
> > > path.
> > > : A naively written proxy might have a security issue if its developers
> > > : didn't know that a request to a collection can be pointed at another
> that
> > > : wasn't intended to be accessible.
> > > :
> > > : I'd rather see a more holistic elegant refactoring instead of adding
> > > : another parameter.  Here's a straw-man proposal that uses URL matrix
> > > : parameters to parameterize the routing before/separate from query
> > > : parameters.  I'll show some examples (assume SolrCloud mode)
> > > :
> > > :   Existing scenarios:
> > > : /solr/collectionName/handlerName
> > > : /solr/aliasName/handlerName
> > > : /solr/collection1,collection2,collection3/handlerName
> > > : /solr/coreName/handlerName  (would like this to go away in SolrCloud)
> > > :   New scenarios:
> > > : /solr/collectionName;s=shardName/handlerName
> > > : /solr/collectionName;s=shardName;r=replicaName/handlerName
> > > : /solr/collectionName;s=shardName;leader=true/handlerName
> > > :
> > > : If matrix parameters are present (presence of a semicolon),
> SolrCloud can
> > > : know collectionName is a collection name (and not an alias or a
> core).
> > > "s"
> > > : means shard name, "r" means replica name (which might rarely be
> used[2]).
> > > : The single-char choices are the same as used in our logging pattern
> for
> > > : MDC.  "leader=true" for the leader of course.  Matrix parameters are
> > > : extensible; we might see fit to add "x" for the core name or other
> > > : parameters similar to that of shards.preference[3]
> > > :
> > > : Any thoughts on this?
> > > :
> > > : Java variable name parameters might use the term "collSpec" or
> something
> > > to
> > > : indicate that the input isn't necessarily a collection.
> > > :
> > > : [1] "collection" param was added as part of SOLR-4497 for Collection
> > > : Aliasing but it wasn't necessary.  Years later when aliasing was
> improved
> > > : (by me), the path component supported a comma delimited list.  But
> > > : "collection" should probably have been deprecated.  If you think not;
> > > what
> > > : am I missing?
> > > : [2] Specifying the replica *on a specific node* is probably always
> > > : redundant since there is very likely exactly one or zero replicas
> for the
> > > : shard.  If there's more than one, either will do (they are
> replicas).  It
> > > : could be interesting if the client could detect the redundancy and
> then
> > > be
> > > : more specific only then but that's probably unnecessary.  I bet tests
> > > : overload replicas per shard on a node, however.
> > > : [3]
> > > :
> > >
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#shards-preference-parameter
> > > :
> > > : ~ David Smiley
> > > : Apache Lucene/Solr Search Developer
> > > : http://www.linkedin.com/in/davidwsmiley
> > > :
> > >
> > > -Hoss
> > > http://www.lucidworks.com/
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: SolrCloud request routing URL structure

Reply via email to