On 03/05/2015 06:00 PM, Ruediger Pluem wrote:
>
>
> On 03/05/2015 05:29 PM, Eric Covener wrote:
>> On Thu, Mar 5, 2015 at 11:14 AM, Ruediger Pluem <[email protected]> wrote:
>>> I suspect that the worker was already set to error by a parallel thread /
>>> process and hence
>>> PROXY_WORKER_IS_USABLE(worker) is false and causes worker->s->error_time to
>>> be reset which causes the worker to be open
>>> for retry immediately. This has been the case since r104624
>>> (http://svn.apache.org/viewvc?view=revision&revision=r104624) 10,5 years
>>> ago and the commit messages offers no hint at
>>> least to be why we reset these values.
>>> Can anybody think of a good reason why we do this?
>>
>> that sequence sounds plausible
>
> So do you see any reason why we do this, or should we just stop doing so?
>
> Another thing:
>
> else {
> if (worker->s->retries) {
> /*
> * A worker came back. So here is where we need to
> * either reset all params to initial conditions or
> * apply some sort of aging
> */
> }
>
> The comment makes a wrong assumption as we might be here despite the
> connection failed because the worker was already in
> error. So we might need to split into two ifs above and separate the
> conditions where the connection failed and where we
> need to do something with the worker because it wasn't in error yet. How
> about:
>
Index: proxy_util.c
===================================================================
--- proxy_util.c (revision 1664261)
+++ proxy_util.c (working copy)
@@ -2887,14 +2887,15 @@
connected = 1;
}
+ if (PROXY_WORKER_IS_USABLE(worker)) {
/*
* Put the entire worker to error state if
* the PROXY_WORKER_IGNORE_ERRORS flag is not set.
* Altrough some connections may be alive
* no further connections to the worker could be made
*/
- if (!connected && PROXY_WORKER_IS_USABLE(worker) &&
- !(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
+ if (!connected) {
+ if (!(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
worker->s->error_time = apr_time_now();
worker->s->status |= PROXY_WORKER_IN_ERROR;
ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
@@ -2902,6 +2903,7 @@
APR_TIME_T_FMT "s",
worker->s->hostname, apr_time_sec(worker->s->retry));
}
+ }
else {
if (worker->s->retries) {
/*
@@ -2915,6 +2917,19 @@
}
return connected ? OK : DECLINED;
}
+ else {
+ /*
+ * The worker is in error likely done by a different thread / process
+ * e.g. for a timeout or bad status. We should respect this and should
+ * not continue with a connection via this worker even if we got one.
+ */
+ if (connected) {
+ apr_socket_close(conn->sock );
+ conn->sock = NULL;
+ }
+ return DECLINED;
+ }
+}
static apr_status_t connection_shutdown(void *theconn)
{
The above patch fixes my issue (same one as in previous post but without
whitespace changes for easier reading).
Objections? Otherwise I would commit.
Regards
Rüdiger