Overseer, expiring queued messages

David Smiley Wed, 31 Jan 2024 09:48:22 -0800

I have a proposal and am curious what folks think.  When the Overseer
dequeues an admin command message to process, imagine it being
enhanced to examine the "ctime" (creation time) of the ZK message node
to determine how long it has been enqueued, and thus roughly how long
the client has been waiting.  If it's greater than a configured
threshold (1 minute?), respond with an error of a timeout nature.
"Sorry, the Overseer is so backed up that we fear you have given up;
please try again".  This would not apply to an "async" style
submission.


Motivation:  Due to miscellaneous reasons at scale that are very user
/ situation dependent, the Overseer can get seriously backed up.  The
client, making a typical synchronous call to, say, create a
collection, may reach its timeout (say a minute) and has given up.
Today, SolrCloud doesn't know this; it goes on its merry way and
creates a collection anyway.  Depending on how Solr is used, this can
be an orphaned collection that the client doesn't want anymore.  That
is to say, the client wants a collection but it wanted it at the time
it asked for it with the name it asked for at that time.  If it fails,
it will come back later and propose a new name.  This doesn't have to
be collection creation specific; I'm thinking that in principle it
doesn't really matter what the command is.  If Solr takes too long for
the Overseer to receive the message; just timeout, basically.

Thoughts?

This wouldn't be a concern for the distributed mode of collection
processing as there is no queue bottleneck; the receiving node
processes the request immediately.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Overseer, expiring queued messages

Reply via email to