Thanks for calling this out more explicitly; definitelyf worth discussing.

> If a client/caller/user lists collections and then loops them to take
some action on them, it needs to be tolerant of the collection not working;
may seem to not exist.

I'd go even a step further and say that users should always have
error-handling around their calls to Solr.

But even so I'm leery of changing the semantics here.  I think the
assumption of most folks is that each entry returned by a "list" exists
fully, unless the response gives more granular info to augment that.  I'd
worry that returning partially-created or partially-deleted collections
would be confusing and unintuitive to most users.  (e.g. Imagine iterating
over a "list", getting a not-found error running some operation on one of
the entries, but still seeing the collection when you call "list" again to
double-check.)

I understand the need for a more scalable API, or a way to detect orphaned
data in ZK.  But I'd personally rather not see us change the LIST semantics
to accomplish that.  If you need the ZK child nodes, is there maybe a
scalable way to invoke ZookeeperInfoHandler to get that information?

Best,

Jason

On Fri, Jan 26, 2024 at 2:46 PM David Smiley <dsmi...@apache.org> wrote:

> https://issues.apache.org/jira/browse/SOLR-16909
> > Collections LIST command should fetch ZK data, not cached state
>
> I want to get further input from folks that changing the semantics is
> okay.  If the change is applied, LIST will be much faster but it will
> return collections that have not yet been fully constructed or
> deleted.  If a client/caller/user lists collections and then loops
> them to take some action on them, it needs to be tolerant of the
> collection not working; may seem to not exist.  I argue callers should
> *already* behave in this way or it may be brittle to circumstances
> that are hard to reason about.  On the other hand, maybe this would
> increase the frequency of errors to existing clients that didn't
> encounter this in testing?  Shrug.  I could imagine ways to solve this
> but it would add some complexity and it's not clear it's worthwhile.
>
> A related aside: the method ClusterStatus.getCollectionsMap is not
> scalable for clusters with 10K+ collections because it loops every
> collection to fetch the latest stake from ZK, putting a massive load
> on ZK.  Our implementation of collection listing calls it, as does a
> number of places across Solr.  Some could be changed with relative
> ease; some are more thorny.  I'd love to rename this thing, putting
> "slow" in the name so that you think twice before calling it :-)
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

Reply via email to