Hi,

The purpose of this mail is to give you an insight on some stuff I've been 
trying at work (playing some could argue) that I'd like to share in case it 
could be useful to any of you out there.

        I won't describe all the issues I've had during the compilation, 
configuration and functional/performance testing nor ask you for help but 
rather just describe what I've done and document one of the last problems I had 
which kept me awake a few nights (segmentation fault).

        I have for the past 4 weeks been trying to evaluate if FreeRadius can 
be used as a AAA in an UMTS network with a large amount of subscribers for the 
GPRS Data services. With "if it can be used" I mean essentially if it can 
handle:

(1) Functionality: basic Authentication/Authorization/Accounting, IP Address 
allocation and some GPRS attribute to IP Address mapping storage.

(2) High Availability (no single point of failure HW/SW)

(3) Distributed Architecture (performance target of 250 requests/second peak 
hour at a reasonable HW/SW cost)


        For the purpose of this test I have decided to use (32 bit due to 
problems getting it to compile with 64 bit on SPARC with the distributed 
binaries from MySQL):

(a) Solaris 8 on SPARC (selected due to the fact that these machines were 
pretty much idle at my company similar tests were run on x86 PCs based on 
Fedora Linux Core 4).

(b) MySQL 5.0.21 (MAX version) 32 bit SPARC binary distribution.

(c) Freeradius 1.1.1 (originally with 1.1.0 but due to bugs on the Dictionary 
and thanks to recommendation (mail archives) from Alan DeKok I upgraded.

(d) For IP allocation I'm using the rlm_sqlippool module (hard to tell its 
version because it's not version controlled as far as I could see, I got it 
from a Russian website) as per Alan DeKok's recommendation (mail archives). It 
will require some customization as I'm looking into being able to define IP 
pools as being comprised of several (not just one) start/end IP ranges.


        The test bed is basically two physical nodes each running the same 
software i.e. radiusd, mysqld and ndbd (MySQL clustered storage engine 
process). The NAS (in UMTS these are called GGSN) will load-balance the 
requests (directly or through an IP Load Balancer or even a freeradius proxy 
haven't decided yet which).

        This configuration allows vertical (bigger machines) and horizontal 
(more machines) scalability by adding more CPU:s or extra nodes to the cluster 
respectively for improved performance. I have tested the vertical scalability 
and it's linear with the CPU utilization. The horizontal will be tested in the 
coming days (hard to get hold of the required HW for the tests). I will publish 
some results (more quantitative than this email) then.

        Last but not least (and in connection to the subject of this email) one 
bug I found on the rlm_sqlippool that I have (as I mentioned hard to tell its 
version) is that during load testing and given the right circumstances 
(multiple NAS, Solaris architecture, MySQL Cluster storage engine only and high 
CPU utilization) I was getting a core dump of the 'radiusd' process.

        The problem was during the post-authorization phase of the sqlippool 
module on the 'allocate-find' SQL statement result retrieval due to the fact 
that the expected result row (just one expected with just one field containing 
the IP address to allocate) had invalid memory references (a row is modelled as 
an array of references to result columns and the only reference was invalid and 
therefore causing a segmentation fault to happen).

        Looking at the code and debugging it for a while I noticed that the 
memory holding the result set was being released before it was being used 
(though previously a reference to the first and only row had been kept) hence 
causing unpredictable results.

        Anyhow the code changes to fix this was to simply move the 
'sql_finish_select_query' function call (which indirectly calls the MySQL 
function 'mysql_free_result' to release memory allocated to the result set) a 
few lines down the 'sqlippool_query1' function which is the one retrieving the 
IP Address to be allocated in 'rlm_sqlippool.c' file. See below for details:

1       /*
2        * Query the database expecting a single result row
3        */
4       static int sqlippool_query1(char * out, int outlen, const char * fmt, 
SQLSOCK * sqlsocket, void * instance, REQU
5       EST * request, char * param, int param_len)
6       {
7               rlm_sqlippool_t * data = (rlm_sqlippool_t *) instance;
8               char expansion[MAX_STRING_LEN * 4];
9               char query[MAX_STRING_LEN * 4];
10              SQL_ROW row;
11              int r;
12      
13              sqlippool_expand(expansion, sizeof(expansion), fmt, instance, 
param, param_len);
14      
15              /*
16               * Do an xlat on the provided string
17               */
18              if (request) {
19                      if (!radius_xlat(query, sizeof(query), expansion, 
request, NULL)) {
20                              radlog(L_ERR, "sqlippool_command: xlat 
failed.");
21                              out[0] = '\0';
22                              return 0;
23                      }
24              }
25              else {
26                      strcpy(query, expansion);
27              }
28      
29      #if 0
30              DEBUG2("sqlippool_query1: '%s'", query);
31      #endif
32      
33              if (rlm_sql_select_query(sqlsocket, data->sql_inst, query)){
34                      radlog(L_ERR, "sqlippool_query1: database query error");
35                      out[0] = '\0';
36                      return 0;
37              }
38      
39              r = rlm_sql_fetch_row(sqlsocket, data->sql_inst);
40      
41              if (r) {
42                      DEBUG("sqlippool_query1: SQL query did not succeed");
43                      out[0] = '\0';
44                      return 0;
45              }
46      
47              row = sqlsocket->row;
48              if (row == NULL) {
49                      DEBUG("sqlippool_query1: SQL query did not return any 
results");
50                      out[0] = '\0';
51                      return 0;
52              }
53      
54              if (row[0] == NULL){
55                      DEBUG("sqlippool_query1: row[0] returned NULL");
56                      out[0] = '\0';
57                      return 0;
58              }
59      
60              r = strlen(row[0]);
61              if (r >= outlen){
62                      DEBUG("sqlippool_query1: insufficient string space");
63                      out[0] = '\0';
64                      return 0;
65              }
66      
67              strncpy(out, row[0], r);
68              out[r] = '\0';
69      
70              (data->sql_inst->module->sql_finish_select_query)(sqlsocket, 
data->sql_inst->config);
71      
72              return r;
73      }


        Line number 70 was originally right after 39 (after keeping a reference 
to the first (and only) result row. The problem is that the row is a reference 
to references to memory allocated by the MySQL C API, which gets released 
whenever the 'mysql_free_result' function gets called, but the problem it only 
popped up under certain conditions hard to re-create.


        I'm done for now more details will come later meanwhile I have a 
question: is the rlm_sqlippool module going to be part of a freeradius release 
in the near future and if not, what would it be the procedure to follow for it 
to happen?

Thanks and hope I didn't take so much of your time if you have read the whole 
thing!

Cheers,
Alex.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to