Simon Strandman wrote:
Hi!
Some binary distros like Mandrake and suse patches their glibcs with
x86_64 optimized strings and an x86_64 optimized libm to improve
performance.
I tried extracting those patches from an mandrake SRPM and add them to
the glibc 2.3.5 ebuild. The x86_64 optimized strings patch built and
worked perfectly and gave a large speedup as you can see below. But I
couldn't get glibc to build with the libm patch because of unresolved
symbols (and I'm no programmer so I have no idea how to fix that).
I found a small C program on a suse mailing-list to measure glibc
memory copy performance:
http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html
With the glibc 2.3.5 currently in gentoo I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 1291.600098 MBytes / sec. Block size =
1048576.
But with glibc 2.3.5 + amd64 optimized strings I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 2389.321777 MBytes / sec. Block size =
1048576.
That's an improvement of over 1000mb/s! Suse 9.3 also gives about
2300mb/s out of the box.
How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1
before it leaves package.mask? I'll create a bugreport about it if you
agree!
This .tar.bz2 contains the glibc directory from my overlay with the
mandrake patches included in files/mdk, but the libm patches are
commented out in the ebuild.
http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2
There is a bug in the original memcpy.c that will cause a segfault if
you don't pass it any parameters. Here is a fixed version. I've left
everything else alone (except for a spelling correction).
// memcpy.c - Measure how fast we can copy memory
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
/* timing function */
#define rdtscll(val) do { \
unsigned int a,d; \
asm volatile("rdtsc" : "=a" (a), "=d" (d)); \
(val) = ((unsigned long)a) | (((unsigned long)d)<<32); \
} while(0)
int main(int argc, char *argv[]) {
int cpu_rate, num_loops, block_size, block_size_lwords, i, j;
unsigned char *send_block_p, *rcv_block_p;
unsigned long start_time, end_time;
float rate;
unsigned long *s_p, *r_p;
if (argc != 4) {
fprintf(stderr,
"Usage: %s <cpu clk rate (MHz)> <num. iterations> <copy block
size>\n",
argv[0] );
return 1;
}
cpu_rate = atoi(argv[1]);
num_loops = atoi(argv[2]);
block_size = atoi(argv[3]);
block_size_lwords = block_size / sizeof(unsigned long);
block_size = sizeof(unsigned long) * block_size_lwords;
send_block_p = malloc(block_size);
rcv_block_p = malloc(block_size);
if ((send_block_p == NULL) || (rcv_block_p == NULL)) {
fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n",
block_size);
}
// start_time = clock();
rdtscll(start_time);
for (i = 0; i < num_loops; i++) {
memcpy(rcv_block_p, send_block_p, block_size);
// s_p = (unsigned long *) send_block_p;
// r_p = (unsigned long *) rcv_block_p;
//
// for (j = 0 ; j < block_size_lwords; j++) {
// *(r_p++) = *(s_p++);
// }
}
// end_time = clock();
rdtscll(end_time);
rate = (float) (block_size) * (float) (num_loops) /
((float) (end_time - start_time)) *
((float) cpu_rate) * 1.0E6 / 1.0E6;
fprintf(stdout,
"Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n",
rate, block_size);
} /* end main() */
--
"Pluralitas non est ponenda sine necessitate" - W. of O.
--
[email protected] mailing list