Simon Strandman wrote:

Hi!

Some binary distros like Mandrake and suse patches their glibcs with x86_64 optimized strings and an x86_64 optimized libm to improve performance.

I tried extracting those patches from an mandrake SRPM and add them to the glibc 2.3.5 ebuild. The x86_64 optimized strings patch built and worked perfectly and gave a large speedup as you can see below. But I couldn't get glibc to build with the libm patch because of unresolved symbols (and I'm no programmer so I have no idea how to fix that).

I found a small C program on a suse mailing-list to measure glibc memory copy performance:
http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html

With the glibc 2.3.5 currently in gentoo I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 1291.600098 MBytes / sec. Block size = 1048576.

But with glibc 2.3.5 + amd64 optimized strings I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 2389.321777 MBytes / sec. Block size = 1048576.

That's an improvement of over 1000mb/s! Suse 9.3 also gives about 2300mb/s out of the box.

How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1 before it leaves package.mask? I'll create a bugreport about it if you agree!

This .tar.bz2 contains the glibc directory from my overlay with the mandrake patches included in files/mdk, but the libm patches are commented out in the ebuild.
http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2

There is a bug in the original memcpy.c that will cause a segfault if you don't pass it any parameters. Here is a fixed version. I've left everything else alone (except for a spelling correction).

// memcpy.c - Measure how fast we can copy memory

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

/* timing function */
#define rdtscll(val) do { \
    unsigned int a,d; \
    asm volatile("rdtsc" : "=a" (a), "=d" (d)); \
    (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \
} while(0)

int main(int argc, char *argv[]) {
 int cpu_rate, num_loops, block_size, block_size_lwords, i, j;
 unsigned char *send_block_p, *rcv_block_p;
 unsigned long start_time, end_time;
 float rate;
 unsigned long *s_p, *r_p;

 if (argc != 4) {
   fprintf(stderr,
"Usage: %s <cpu clk rate (MHz)> <num. iterations> <copy block size>\n",
          argv[0] );
   return 1;
 }

 cpu_rate = atoi(argv[1]);
 num_loops = atoi(argv[2]);
 block_size = atoi(argv[3]);

 block_size_lwords = block_size / sizeof(unsigned long);
 block_size = sizeof(unsigned long) * block_size_lwords;

 send_block_p = malloc(block_size);
 rcv_block_p = malloc(block_size);

 if ((send_block_p == NULL) || (rcv_block_p == NULL)) {
   fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n",
           block_size);
 }

// start_time = clock();
   rdtscll(start_time);

 for (i = 0; i < num_loops; i++) {
   memcpy(rcv_block_p, send_block_p, block_size);

// s_p = (unsigned long *) send_block_p;
// r_p = (unsigned long *) rcv_block_p;
//
// for (j = 0 ; j < block_size_lwords; j++) {
// *(r_p++) = *(s_p++);
// }
 }

// end_time = clock();
   rdtscll(end_time);

 rate = (float) (block_size) * (float) (num_loops) /
        ((float) (end_time - start_time)) *
        ((float) cpu_rate) * 1.0E6 / 1.0E6;

 fprintf(stdout,
   "Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n",
   rate, block_size);

} /* end main() */


--
"Pluralitas non est ponenda sine necessitate" - W. of O.

--
[email protected] mailing list

Reply via email to