On 03/05/2015 05:29 PM, Eric Covener wrote:
> On Thu, Mar 5, 2015 at 11:14 AM, Ruediger Pluem <[email protected]> wrote:
>> I suspect that the worker was already set to error by a parallel thread /
>> process and hence
>> PROXY_WORKER_IS_USABLE(worker) is false and causes worker->s->error_time to
>> be reset which causes the worker to be open
>> for retry immediately. This has been the case since r104624
>> (http://svn.apache.org/viewvc?view=revision&revision=r104624) 10,5 years ago
>> and the commit messages offers no hint at
>> least to be why we reset these values.
>> Can anybody think of a good reason why we do this?
>
> that sequence sounds plausible
So do you see any reason why we do this, or should we just stop doing so?
Another thing:
else {
if (worker->s->retries) {
/*
* A worker came back. So here is where we need to
* either reset all params to initial conditions or
* apply some sort of aging
*/
}
The comment makes a wrong assumption as we might be here despite the connection
failed because the worker was already in
error. So we might need to split into two ifs above and separate the conditions
where the connection failed and where we
need to do something with the worker because it wasn't in error yet. How about:
Index: proxy_util.c
===================================================================
--- proxy_util.c (revision 1664261)
+++ proxy_util.c (working copy)
@@ -2887,33 +2887,48 @@
connected = 1;
}
- /*
- * Put the entire worker to error state if
- * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
- * Altrough some connections may be alive
- * no further connections to the worker could be made
- */
- if (!connected && PROXY_WORKER_IS_USABLE(worker) &&
- !(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
- worker->s->error_time = apr_time_now();
- worker->s->status |= PROXY_WORKER_IN_ERROR;
- ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
- "ap_proxy_connect_backend disabling worker for (%s) for %"
- APR_TIME_T_FMT "s",
- worker->s->hostname, apr_time_sec(worker->s->retry));
+ if (PROXY_WORKER_IS_USABLE(worker)) {
+ /*
+ * Put the entire worker to error state if
+ * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
+ * Altrough some connections may be alive
+ * no further connections to the worker could be made
+ */
+ if (!connected) {
+ if (!(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
+ worker->s->error_time = apr_time_now();
+ worker->s->status |= PROXY_WORKER_IN_ERROR;
+ ap_log_error(APLOG_MARK, APLOG_ERR, 0, s, APLOGNO(00959)
+ "ap_proxy_connect_backend disabling worker for (%s) for %"
+ APR_TIME_T_FMT "s",
+ worker->s->hostname, apr_time_sec(worker->s->retry));
+ }
+ }
+ else {
+ if (worker->s->retries) {
+ /*
+ * A worker came back. So here is where we need to
+ * either reset all params to initial conditions or
+ * apply some sort of aging
+ */
+ }
+ worker->s->error_time = 0;
+ worker->s->retries = 0;
+ }
+ return connected ? OK : DECLINED;
}
else {
- if (worker->s->retries) {
- /*
- * A worker came back. So here is where we need to
- * either reset all params to initial conditions or
- * apply some sort of aging
- */
+ /*
+ * The worker is in error likely done by a different thread / processi
+ * e.g. for a timeout or bad status. We should respect this and should
+ * not continue with a connection via this worker even if we got one.
+ */
+ if (connected) {
+ apr_socket_close(conn->sock );
+ conn->sock = NULL;
}
- worker->s->error_time = 0;
- worker->s->retries = 0;
+ return DECLINED;
}
- return connected ? OK : DECLINED;
}
static apr_status_t connection_shutdown(void *theconn)
>
>> Another question is if we shouldn't do
>>
>> worker->s->error_time = apr_time_now();
>>
>> also in case the worker is already in error state to restart the retry clock
>> as we just faced an error with connecting
>> to the backend.
>
> Isn't this already handled by the thread that put the worker in error
> during the race? If there's no race, we already have this line of code
> for the main path.
>
Valid point, but we might have an error situation here that is more recent then
the one by the thread that put it into
error. OTOH as I suspect a race the difference is likely to be rather slim, so
it might not be worth the effort.
Regards
Rüdiger