Re: [gentoo-amd64] x86_64 optimization patches for glibc.

Matt Randolph Sat, 23 Jul 2005 15:16:01 -0700

Simon Strandman wrote:

Hi!
Some binary distros like Mandrake and suse patches their glibcs withx86_64 optimized strings and an x86_64 optimized libm to improveperformance.
I tried extracting those patches from an mandrake SRPM and add them tothe glibc 2.3.5 ebuild. The x86_64 optimized strings patch built andworked perfectly and gave a large speedup as you can see below. But Icouldn't get glibc to build with the libm patch because of unresolvedsymbols (and I'm no programmer so I have no idea how to fix that).
I found a small C program on a suse mailing-list to measure glibcmemory copy performance:
http://lists.suse.com/archive/suse-amd64/2005-Mar/0220.html

With the glibc 2.3.5 currently in gentoo I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 1291.600098 MBytes / sec. Block size =1048576.
But with glibc 2.3.5 + amd64 optimized strings I get:
isidor ~ # ./memcpy 2200 1000 1048576
Memory to memory copy rate = 2389.321777 MBytes / sec. Block size =1048576.
That's an improvement of over 1000mb/s! Suse 9.3 also gives about2300mb/s out of the box.
How about adding these patches to gentoo? Perhaps in glibc 2.3.5-r1before it leaves package.mask? I'll create a bugreport about it if youagree!
This .tar.bz2 contains the glibc directory from my overlay with themandrake patches included in files/mdk, but the libm patches arecommented out in the ebuild.
http://snigel.no-ip.com/~nxsty/linux/glibc.tar.bz2

There is a bug in the original memcpy.c that will cause a segfault ifyou don't pass it any parameters. Here is a fixed version. I've lefteverything else alone (except for a spelling correction).


// memcpy.c - Measure how fast we can copy memory

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

/* timing function */
#define rdtscll(val) do { \
    unsigned int a,d; \
    asm volatile("rdtsc" : "=a" (a), "=d" (d)); \
    (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \
} while(0)

int main(int argc, char *argv[]) {
 int cpu_rate, num_loops, block_size, block_size_lwords, i, j;
 unsigned char *send_block_p, *rcv_block_p;
 unsigned long start_time, end_time;
 float rate;
 unsigned long *s_p, *r_p;

 if (argc != 4) {
   fprintf(stderr,

"Usage: %s <cpu clk rate (MHz)> <num. iterations> <copy blocksize>\n",

          argv[0] );
   return 1;
 }

 cpu_rate = atoi(argv[1]);
 num_loops = atoi(argv[2]);
 block_size = atoi(argv[3]);

 block_size_lwords = block_size / sizeof(unsigned long);
 block_size = sizeof(unsigned long) * block_size_lwords;

 send_block_p = malloc(block_size);
 rcv_block_p = malloc(block_size);

 if ((send_block_p == NULL) || (rcv_block_p == NULL)) {
   fprintf(stderr, "Malloc failed to allocate block(s) of size %d.\n",
           block_size);
 }

// start_time = clock();
   rdtscll(start_time);

 for (i = 0; i < num_loops; i++) {
   memcpy(rcv_block_p, send_block_p, block_size);

// s_p = (unsigned long *) send_block_p;
// r_p = (unsigned long *) rcv_block_p;
//
// for (j = 0 ; j < block_size_lwords; j++) {
// *(r_p++) = *(s_p++);
// }
 }

// end_time = clock();
   rdtscll(end_time);

 rate = (float) (block_size) * (float) (num_loops) /
        ((float) (end_time - start_time)) *
        ((float) cpu_rate) * 1.0E6 / 1.0E6;

 fprintf(stdout,
   "Memory to memory copy rate = %f MBytes / sec. Block size = %d.\n",
   rate, block_size);

} /* end main() */


--
"Pluralitas non est ponenda sine necessitate" - W. of O.

--
[email protected] mailing list

Re: [gentoo-amd64] x86_64 optimization patches for glibc.

Reply via email to