On Tue, Dec 10, 2024 at 5:36 PM Chris Hostetter <hossman_luc...@fucit.org> wrote:
> > I don't disagree with any of your points. If anything i think the problem > is more egregious then you characterize it (especially in the case of > requests for specific replicas that have been (re)moved -- IIRC not only > does the Solr node return 404, but before doing that it forcibly refreshes > the entire cluster state to se if there is a "new" collection it doesn't > know about with that name) > I see the refresh of a specific collection but not "the entire cluster state"; can you elaborate? On a related note, I recently removed in main/10 the fallback logic that would loop collections hunting for the core name: SOLR-17568: The one thing i think you may be overlooking is in your comment that you'd > like to see requests to specific cores go away -- presumably because > you feel like shard specificity is enough for sub-requests? Yes... but I reconsider. At least typically the shard is enough and always would be from CloudSolrClient. > But > being able to target a specific core with requests is kind of important > for diagnosing bugs/discrepencies. Nitpicking here, "being able to" -- indeed I showed how it would be possible to do so with "r" for a specific replica or "x" for a core name albeit not sure why "x" would be used in SolrCloud at all versus "r". Regardless, "x" is going to be logged to MDC. Maybe you meant, not only "be able to" but in fact ensure Solr continues to do so with such specificity? As I imagine implementing this, I think the specific callers/use-cases will choose the specificity that makes sense for that use-case. For example IndexFetcher fetching from the leader would use leader=true. Distributed indexing phase FROMLEADER would indicate the specific replica -- so there's one case I acknowledge the need. Distributed search wouldn't care about which replica on the node. Even in a perfectly functioning > system, features like shards.preference depend on being able to route a > request to a specific replica on a node -- not just any replica of that > shard (ie: prefer PULL replicas) > shards.preference doesn't go away with my proposal; it still guides selection of the intended node. Normally there's only one replica of a shard on a node anyway. > > I don't have any strong objections to your "matrixized" path param, but I > would suggest two alternative strawmen: > I thought of both already... Using more path elements feels awkward to me as it would put handlers at different levels. Furthermore, handlers can be registered with slashes as well, and we have many that already are (/admin/luke). That means ambiguity -- is "admin" a shard? I like the current relative simplicity of /solr/collectionEtc/handler in terms of the number of slashes. Not sure if using more slashes would make our API renovation efforts more awkward either; Jason may have an opinion on that. I like the HTTP header proposal for its backwards-compatibility story since an older Solr server would continue to function fine. But as a practical matter, on the client side, I think the coding realities would be a royal PITA. And these hints are invisible from logs. Ultimately, to me it feels inelegant even if kind of clever. Most if not all places where CloudSolrClient/SolrCloud wishes to talk to a core on a node, it obtains a Replica (from DocCollection/Slice) and calls org.apache.solr.common.cloud.Replica#getCoreUrl. Do a find-usages -- 32 places in the production side of the codebase. It would be really easy to add getShardUrl and getReplicaUrl variants formatted as I describe! At least in principle... all those call-sites will translate to a bunch of work, let alone the receiver-side handling. CloudSolrClient is the most important sender for this feature, though. A challenge to my proposal is backwards-compatibility. A system property could toggle them having those methods use the current getCoreUrl logic. If the server-side handling could make it into 9.9, it'd help as well.