Re: Collections LIST semantics

David Smiley Mon, 29 Jan 2024 11:23:17 -0800

Yeah, I'm sympathetic to that viewpoint.  I was coming at this from
Walter's -- clients must be tolerant always.  This mindset is
important when working on scalable distributed systems.  But depending
on clients being so tolerant leads to being less friendly --
increasing the likelihood that they will have to deal with such
errors.  Solr might even appear buggy to such a client/user.  Shrug.


At work we've got this modification to add listAll to collection
listing (thus can toggle the semantics) but for scalability reasons,
we're finding we want this enabled everywhere, which begs the question
if it should simply work this way to begin with.  I'm also motivated
to contribute to Solr without adding complexity -- arguably listing
collections shouldn't need any parameters.  But we could contribute it
this way; okay?  And maybe make listAll's default be a system property
so you can run Solr in this way.

On Mon, Jan 29, 2024 at 1:42 PM Jason Gerlowski <gerlowsk...@gmail.com> wrote:
>
> Thanks for calling this out more explicitly; definitelyf worth discussing.
>
> > If a client/caller/user lists collections and then loops them to take
> some action on them, it needs to be tolerant of the collection not working;
> may seem to not exist.
>
> I'd go even a step further and say that users should always have
> error-handling around their calls to Solr.
>
> But even so I'm leery of changing the semantics here.  I think the
> assumption of most folks is that each entry returned by a "list" exists
> fully, unless the response gives more granular info to augment that.  I'd
> worry that returning partially-created or partially-deleted collections
> would be confusing and unintuitive to most users.  (e.g. Imagine iterating
> over a "list", getting a not-found error running some operation on one of
> the entries, but still seeing the collection when you call "list" again to
> double-check.)
>
> I understand the need for a more scalable API, or a way to detect orphaned
> data in ZK.  But I'd personally rather not see us change the LIST semantics
> to accomplish that.  If you need the ZK child nodes, is there maybe a
> scalable way to invoke ZookeeperInfoHandler to get that information?
>
> Best,
>
> Jason
>
> On Fri, Jan 26, 2024 at 2:46 PM David Smiley <dsmi...@apache.org> wrote:
>
> > https://issues.apache.org/jira/browse/SOLR-16909
> > > Collections LIST command should fetch ZK data, not cached state
> >
> > I want to get further input from folks that changing the semantics is
> > okay.  If the change is applied, LIST will be much faster but it will
> > return collections that have not yet been fully constructed or
> > deleted.  If a client/caller/user lists collections and then loops
> > them to take some action on them, it needs to be tolerant of the
> > collection not working; may seem to not exist.  I argue callers should
> > *already* behave in this way or it may be brittle to circumstances
> > that are hard to reason about.  On the other hand, maybe this would
> > increase the frequency of errors to existing clients that didn't
> > encounter this in testing?  Shrug.  I could imagine ways to solve this
> > but it would add some complexity and it's not clear it's worthwhile.
> >
> > A related aside: the method ClusterStatus.getCollectionsMap is not
> > scalable for clusters with 10K+ collections because it loops every
> > collection to fetch the latest stake from ZK, putting a massive load
> > on ZK.  Our implementation of collection listing calls it, as does a
> > number of places across Solr.  Some could be changed with relative
> > ease; some are more thorny.  I'd love to rename this thing, putting
> > "slow" in the name so that you think twice before calling it :-)
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: Collections LIST semantics

Reply via email to