Re: LBSolrClient and "zombie" check at core level vs node level

David Smiley Mon, 20 Nov 2023 07:07:36 -0800

That's a really fine idea Hoss!

After reviewing LBSolrClient again, I think your proposal would best be a
new SolrClient subclass.  LBSolrClient has a fair amount of state tracking
but a failover-only client would track no state.  Perhaps LBSolrClient
might subclass it or not.


After some discussion with my colleagues, we might try an experiment that
attempts this and see how it goes.  I suspect it'll be a net positive.  It
takes some production bake time to really get confidence in something of
this nature.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Nov 16, 2023 at 1:55 PM Chris Hostetter <[email protected]>
wrote:

>
> I think it's worth rememberinbg that LBSolrClient, and it's design,
> pre-dates SolrCloud and all of the ZK plumbing we have to know when nodes
> & replicas are "live" ... it was written at a time when people had to
> manually specify the list of solr servers and cores themselve when sending
> requests.
>
> Then when SolrCloud was added, the "zk aware" CloudSolrClient logic was
> wrapped ARROUND LBSolrClient -- CloudSolrClient already has some idea what
> nodes & replicas are "live" when it sends the request, but LBSolrClient
> doesn't so...
>
> : out when there's a wide problem.  I think that LBSolrClient ought to know
> : about the nodes and should try a node level healthceck ping before
> : executing any core level requests.  Maybe if the healthcheck failed then
> : succeeded, and if all of a small sample of zombie cores there pass,
> assume
> : they will all pass (don't send pings to all).  Just a rough idea.
>
> ...i think it's worth considering an inverse idea: make it configurable
> (and probably change the default given the common usecase is SolrCloud) to
> build a LBSolrClient that does *NO* zombie tracking at all -- it just
> continues to use the multiple URL options it's given for each request to
> retry on (certain types of failures).
>
> Leave the "live" node/replica tracking to the CloudSolrClient layer, and
> if there are code paths where it's possible CloudSolrClient is pasing
> stale lists of replica URLs to LBSolrClient that it (should) already know
> are not alive (via zk watchers), let's treat those as bugs in
> CloudSolrClient and fix them.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: LBSolrClient and "zombie" check at core level vs node level

Reply via email to