Hello to everyone.

In a previous thread 
http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg33354.html 
I had described a strange behavior in our large proxy setup. After
running the server in debug mode (radiusd -xxx) in our production
systems we found out what was causing our problems. The problem was
that the home server in our proxy setup was marked dead quite often
during the day and with a dead_time of 30 secs every request that came
within these 30 secs was rejected.

Our proxy conf initially looked like the following:

  proxy server {

        synchronous = yes

        retry_delay = 0

        retry_count = 0

        dead_time = 30
        default_fallback = yes

        post_proxy_authorize = no

}

#######################################################################
#
#  Configuration for the proxy realms.
#
...

We first changed the dead_time to 0 so as to avoid marking the home
server dead in synchronous mode.
Additionally, we implemented the following patch (against version 1.1.6):

--- ./src/main/files.c.orig     2007-04-23 15:14:14.569932000 +0300
+++ ./src/main/files.c  2007-04-23 15:22:30.995686000 +0300
@@ -489,6 +489,15 @@
                        if (cl->last_reply > (( now - 
mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count ))) {
                                continue;
                        }
+                       /*
+                        * If we are running in synchronous proxy mode, there's 
no point marking the target
+                        * server(s) dead, since this should be done by the 
radius client
+                        */
+                       if (mainconfig.proxy_synchronous) {
+                               radlog(L_PROXY, "authentication server %s:%d 
for realm %s seems unresponsive.",
+                                       cl->server, port, cl->realm);
+                               continue;
+                       }

                        cl->active = FALSE;
                        cl->wakeup = now + mainconfig.proxy_dead_time;
@@ -498,6 +507,15 @@
                        if (cl->last_reply > (( now - 
mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count ))) {
                                continue;
                        }
+                       /*
+                        * If we are running in synchronous proxy mode, there's 
no point marking the target
+                        * server(s) dead, since this should be done by the 
radius client
+                        */
+                       if (mainconfig.proxy_synchronous) {
+                               radlog(L_PROXY, "accounting server %s:%d for 
realm %s seems unresponsive.",
+                                       cl->acct_server, port, cl->realm);
+                               continue;
+                       }

                        cl->acct_active = FALSE;
                        cl->acct_wakeup = now + mainconfig.proxy_dead_time;


The purpose of this patch is to not have the freeradius server mark
the home server dead when working in synchronous mode. We believe that
in synchronous operation it is a good idea to leave the job of marking
the server dead to the NAS client.

All the above actions solved our initial problems. However, after a
while we noticed again clients being rejected when they shouldn't. 

The following code in request_list.c caught my attention:

/*
 *  Refresh a request, by using proxy_retry_delay, cleanup_delay,
 *  max_request_time, etc.
 *
 *  When walking over the request list, all of the per-request
 *  magic is done here.
 */
static int refresh_request(REQUEST *request, void *data)
{
...
(around line 1264 version 1.1.6)

        } else if (request->proxy && !request->proxy_reply) {
                /*
                 *  The request is NOT finished, but there is an
                 *  outstanding proxy request, with no matching
                 *  proxy reply.
                 *
                 *  Wake up when it's time to re-send
                 *  the proxy request.
                 *
                 *  But in synchronous proxy, we don't retry but we update
                 *  the next retry time as NAS has not resent the request
                 *  in the given retry window.
                 */
                if (mainconfig.proxy_synchronous) {
                        /*
                         *      If the retry_delay * count has passed,
                         *      then mark the realm dead.
                         */
                        if (info->now > (request->timestamp + 
(mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count))) {
                                rad_assert(request->child_pid == 
NO_SUCH_CHILD_PID);
                                request_reject(request);
                                
                                realm_disable(request->proxy->dst_ipaddr,
                                              request->proxy->dst_port);
                                request->finished = TRUE;
                                goto setup_timeout;
                        }
                        request->proxy_next_try = info->now + 
mainconfig.proxy_retry_delay;
                }
                difference = request->proxy_next_try - info->now;
        } else {
...

It seems that in some "strange" occations the code enters the above
path. A decision is made in case the current time is older than
mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count. If this
is the case, the request is rejected and the code tries to disable the
realm. However in the proxy.conf configuration file it is mentioned:

#  If you want to have the server send proxy retries ONLY when the NAS
#  sends it's retries to the server, then set this to 'yes', and
#  set the other proxy configuration parameters to 0 (zero).
#  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^       
#  Additionally, if you want 'failover' to work, the server must manage
#  retries and timeouts.  Therefore, if this is set to yes, then no
#  failover functionality is possible.
#
        synchronous = no

When we have proxy_retry_delay and proxy_retry_count zero,
the request is rejected whenever we enter the above code path.
I can't tell for sure if there is a problem in that code but it does
not look clean to me. In order to circumvent the problems we just gave
values to these 2 configuration parameters, so we ended up with the
following proxy.conf:

  proxy server {

        synchronous = yes

        retry_delay = 15

        retry_count = 5

        dead_time = 0
        default_fallback = yes

        post_proxy_authorize = no

}

#######################################################################
#
#  Configuration for the proxy realms.
#
...

Please let me know your thoughts on these matters (also on the patch
we provide)

Thanks,

Kostas
- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to