On Tue, Dec 10, 2024 at 5:36 PM Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> I don't disagree with any of your points.  If anything i think the problem
> is more egregious then you characterize it (especially in the case of
> requests for specific replicas that have been (re)moved -- IIRC not only
> does the Solr node return 404, but before doing that it forcibly refreshes
> the entire cluster state to se if there is a "new" collection it doesn't
> know about with that name)
>

I see the refresh of a specific collection but not "the entire cluster
state"; can you elaborate?  On a related note, I recently removed in
main/10 the fallback logic that would loop collections hunting for the core
name: SOLR-17568:

The one thing i think you may be overlooking is in your comment that you'd
> like to see requests to specific cores go away -- presumably because
> you feel like shard specificity is enough for sub-requests?


Yes... but I reconsider.  At least typically the shard is enough and always
would be from CloudSolrClient.


>   But
> being able to target a specific core with requests is kind of important
> for diagnosing bugs/discrepencies.


Nitpicking here, "being able to" -- indeed I showed how it would be
possible to do so with "r" for a specific replica or "x" for a core name
albeit not sure why "x" would be used in SolrCloud at all versus "r".
Regardless, "x" is going to be logged to MDC.
Maybe you meant, not only "be able to" but in fact ensure Solr continues to
do so with such specificity?

As I imagine implementing this, I think the specific callers/use-cases will
choose the specificity that makes sense for that use-case.  For example
IndexFetcher fetching from the leader would use leader=true.  Distributed
indexing phase FROMLEADER would indicate the specific replica -- so there's
one case I acknowledge the need.  Distributed search wouldn't care about
which replica on the node.

  Even in a perfectly functioning
> system, features like shards.preference depend on being able to route a
> request to a specific replica on a node -- not just any replica of that
> shard (ie: prefer PULL replicas)
>

shards.preference doesn't go away with my proposal; it still guides
selection of the intended node.  Normally there's only one replica of a
shard on a node anyway.


>
> I don't have any strong objections to your "matrixized" path param, but I
> would suggest two alternative strawmen:
>

I thought of both already...

Using more path elements feels awkward to me as it would put handlers at
different levels.  Furthermore, handlers can be registered with slashes as
well, and we have many that already are (/admin/luke).  That means
ambiguity -- is "admin" a shard?  I like the current relative simplicity of
/solr/collectionEtc/handler in terms of the number of slashes.  Not sure if
using more slashes would make our API renovation efforts more awkward
either; Jason may have an opinion on that.

I like the HTTP header proposal for its backwards-compatibility story since
an older Solr server would continue to function fine.  But as a practical
matter, on the client side, I think the coding realities would be a royal
PITA.  And these hints are invisible from logs.  Ultimately, to me it feels
inelegant even if kind of clever.

Most if not all places where CloudSolrClient/SolrCloud wishes to talk to a
core on a node, it obtains a Replica (from DocCollection/Slice) and
calls org.apache.solr.common.cloud.Replica#getCoreUrl.  Do a find-usages --
32 places in the production side of the codebase.  It would be really easy
to add getShardUrl and getReplicaUrl variants formatted as I describe!  At
least in principle... all those call-sites will translate to a bunch of
work, let alone the receiver-side handling.  CloudSolrClient is the most
important sender for this feature, though.

A challenge to my proposal is backwards-compatibility.  A system property
could toggle them having those methods use the current getCoreUrl logic.
If the server-side handling could make it into 9.9, it'd help as well.

Reply via email to