That's not true for MSVCRT6 - new and malloc call the same function, and in either Borland or Watcom I've heard that new actually gets thunked a bit before ultimately calling malloc anyway (so new is the same efficiency for management, but a hair slower).
In MSVCRT7 multithreaded, memory allocations (and releases) are preceded by a mutex lock which requires a context switch to the kernel (and back). Allocation under linux (at least under RedHat 9, and likely under all linux) is many times faster than under Windows due to this.
It sounds like the apache allocator works directly with pages though? (or is the 4k just an optimal number?) In the case of the apache web server I'd think that the common case is that you'd be repeatedly allocating and releasing similar sized chunks of memory, and the example of an ever-growing string is a little unusual. If you optimize for the former, then when a block of memory is released you want to keep it around in its current size because it's very likely another call will request the same size block again momentarily. If you optimize for the later then you want to combine smaller blocks into bigger blocks agressively, or if that's not possible then ditch the smaller blocks because it's likely that the smaller blocks will never be reused.
These two situations are at odds with each other. The former solution handles the later situation pretty poorly. The later solution would handle the former situation no worse than any other condition, but not as good as the former solution.
You can find a worst case for most any allocation scheme, but is it a real situation that you need to solve, or is this just an example of finding that worst case situation?
Andrew
Mark Rowe wrote:
c is smaller than c++ STL,
Using apache 2.0.48
I defined my own string struct and my own 'mystrcat' function and tenbyte_string as "0123456789" which is ten bytes.
Did this iteration 30,000 times as shown below. It ran fast and small. It built a string with a strlen() of 300,000 The child never grew beyond about 300 KB of the resting child process. All children dropped back down below 2 MB when they were done.
I hammered it! I held down the 'F5' key on this page on a high bandwidth connection for 2 minutes or more.
Apache was fine very fast spawned many children. All children were small and returned to below 2 MB when I let up on 'F5' key.
---------------------------------------------------------------------------------------------------------------
#define tenbyte_string "0123456789"
typedef struct mystr { char *p; int len; } String;
void mystrcat(String *s, char *str) { char *tmp_str = s->p; int str_len = strlen(str); s->p = realloc(tmp_str, s->len + str_len + 1 ); strcpy(s->p + s->len, str); s->len += str_len; }
void fun(void) { fprintf(stderr, "cannot create pool\n"); }
static int x_handler(request_rec *r) { int i; String s = {0};
if (strcmp(r->handler, "example-handler")) { return DECLINED; }
ap_set_content_type(r, "text/html");
if (r->header_only) { return OK; }
ap_rputs(DOCTYPE_HTML_3_2, r);
ap_rputs("<HTML>\n", r);
ap_rputs(" <HEAD>\n", r);
ap_rputs(" <TITLE>mod_example Module Content-Handler Output\n", r);
ap_rputs(" </TITLE>\n", r);
ap_rputs(" </HEAD>\n", r);
ap_rputs(" <BODY>\n", r);
ap_rputs(" <H1><SAMP>mod_example</SAMP> Module Content-Handler Output\n", r);
ap_rputs(" </H1>\n", r);
ap_rputs(" <P>\n", r);
ap_rprintf(r, " Apache HTTP Server version: \"%s\"\n",
ap_get_server_version());
ap_rputs(" <BR>\n", r);
ap_rprintf(r, " Server built: \"%s\"\n", ap_get_server_built());
ap_rputs("<h1>hello 30,000 world</h1><br>\n", r);
for (i = 0; i < 30000; i++) { mystrcat(&s, tenbyte_string); }
ap_rprintf(r, "strlen(s.p)=%d\n", strlen(s2.p)); free(s.p);
ap_rputs(" </BODY>\n", r); ap_rputs("</HTML>\n", r);
return OK; }
---------------------------------------------------------------------------------------------------------------
Then I put the above code in to a single command line program set iterations to 1,000,000 to make a 10,000,000 byte long string. The program took 1.33 seconds and it took 10,002,432 max mmap bytes according to malloc_stats();
int i = 0; String s2 = {0};
for (i = 0; i < 1e6; ++i) { mystrcat(&s2, tenbyte_string); }
Then I wrote the same equivalent code in c++ using STL it took 1.25 seconds and it took 25,174,016 bytes according to malloc_stats();
int i = 0; string s = "";
for(i=0; i < 1e6; ++i) { s += tenbyte_string; }
So the c program took 1.33/1.25 = 1.06 times longer than the c++ program .
The c++ program took about 2.5 times as much RAM as the c program.
c++ is marginally faster but takes over twice as much RAM as c.
So if Apache is written well enough then it will be very nearly as fast as if written in c++ and run in less than half the size in RAM from this preliminary study.
My tests were run using gcc version 2.96 running on Redhat version 7.2 on an old Dell PowerApp 110 with 128 MB or RAM, 265 MB swap and 600 MHz 80526 cpu. glibc was GNU C Library stable release version 2.2.4
--- Mark R. Rowe, MSEE
