[
https://issues.apache.org/jira/browse/SOLR-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884798#comment-15884798
]
Erick Erickson commented on SOLR-9818:
--------------------------------------
Amrit:
Seems like a good approach. Some questions/comments.
The admin UI still will have some sort of indication that the operation is
ongoing until the status is reported as completed, correct?
What happens if the response from Solr for the initial async request fails? The
scenario I want to avoid is
> The async request is made and is received by Solr but for some mysterious
> reason the initial call reports a failure. At this point the request is being
> processed even though the initial call "failed".
> The admin UI issues another async request for the same action.
That's really the scenario now, just making things async won't change that if
we retry an initial "failed" request that actually was received by Solr.
We could do something like
> make the async call.
> check the status whether the initial call succeeds or not
> If status found, spin until it completes. else ???
The else ??? case is where the gremlins are. Would it be safe to re-submit the
call if no status was found? Or just bail and report an error and suggest the
user examine their setup and "do the right thing"?
One thing I'm not very clear on is how long the status for an async call stays
around. Given that there's a DELETESTATUS API call, I'd guess forever. If
that's the case, perhaps it would be safe to re-submit the async call only if
the state was "notfound". We'd be assuming in that case that the initial call
was never received or acted upon.
That said, though, I think the straw-man behavior I'd propose for discussion is:
> submit the async request
if (the initial call failed _or_ there was no status to be found) {
report an error and suggest the user look check their
system before resubmitting the request. Bail out in this case,
no retries, no attempt to drive on.
} else {
put up a progress indicator while periodically
checking the status, Continue spinning until we can report
the final status.
}
FWIW,
Erick
> Solr admin UI rapidly retries any request(s) if it loses connection with the
> server
> -----------------------------------------------------------------------------------
>
> Key: SOLR-9818
> URL: https://issues.apache.org/jira/browse/SOLR-9818
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Admin UI
> Affects Versions: 6.3
> Reporter: Ere Maijala
>
> It seems that whenever the Solr admin UI loses connection with the server, be
> the reason that the server is too slow to answer or that it's gone away
> completely, it starts hammering the server with the previous request until it
> gets a success response, it seems. That can be especially bad if the last
> attempted action was something like collection reload with a SolrCloud
> instance. The admin UI will quickly add hundreds of reload commands to
> overseer/collection-queue-work, which may essentially cause the replicas to
> get overloaded when they're trying to handle all the reload commands.
> I believe the UI should never retry the previous command blindly when the
> connection is lost, but instead just ping the server until it responds again.
> Steps to reproduce:
> 1.) Fire up Solr
> 2.) Open the admin UI in browser
> 3.) Open a web console in the browser to see the requests it sends
> 4.) Stop solr
> 5.) Try an action in the admin UI
> 6.) Observe the web console in browser quickly fill up with repeats of the
> originally attempted request
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]