Niels, After a closer review of the code, I found that unaligned copy were a lot slower them aligned 1s. Ive created an other version of the routine that will take take of that. Attached to this email, you will find a simple program that I used to test this code. This program will test both aligned and unaligned (src & dst) of the 3 diff implementation (libc memcpy, rev1 armasm memcpy, and rev2 armasm memcpy).
Here is the output of the program running on an arm9 AT91RM9200 using uClibc-0.9.30 and gcc-4.2.4: armasm is rev1, and armasm2 is rev2 # ./memtest 500000 32bit src/dst Aligned test: Testing libc (0x4005a008 <==> 0x40243008 : 500000): 2.996949 sec Testing armasm (0x4005a008 <==> 0x40243008 : 500000): 1.331787 sec Testing armasm2 (0x4005a008 <==> 0x40243008 : 500000): 1.358246 sec The faster routine is armasm 16bit src/dst Aligned test: Testing libc (0x4005a00a <==> 0x4024300a : 500000): 2.983215 sec Testing armasm (0x4005a00a <==> 0x4024300a : 500000): 1.332214 sec Testing armasm2 (0x4005a00a <==> 0x4024300a : 500000): 1.358978 sec The faster routine is armasm 8bit src/dst Aligned test: Testing libc (0x4005a009 <==> 0x40243009 : 500000): 2.982209 sec Testing armasm (0x4005a009 <==> 0x40243009 : 500000): 1.331054 sec Testing armasm2 (0x4005a009 <==> 0x40243009 : 500000): 1.359162 sec The faster routine is armasm 16bit src Aligned test: Testing libc (0x4005a00a <==> 0x40243008 : 500000): 2.983734 sec Testing armasm (0x4005a00a <==> 0x40243008 : 500000): 2.571228 sec Testing armasm2 (0x4005a00a <==> 0x40243008 : 500000): 1.419556 sec The faster routine is armasm2 8bit src Aligned test: Testing libc (0x4005a009 <==> 0x40243008 : 500000): 2.984101 sec Testing armasm (0x4005a009 <==> 0x40243008 : 500000): 2.570343 sec Testing armasm2 (0x4005a009 <==> 0x40243008 : 500000): 1.419525 sec The faster routine is armasm2 16bit dst Aligned test: Testing libc (0x4005a008 <==> 0x4024300a : 500000): 2.983948 sec Testing armasm (0x4005a008 <==> 0x4024300a : 500000): 2.571563 sec Testing armasm2 (0x4005a008 <==> 0x4024300a : 500000): 1.418671 sec The faster routine is armasm2 8bit dst Aligned test: Testing libc (0x4005a008 <==> 0x40243009 : 500000): 2.983521 sec Testing armasm (0x4005a008 <==> 0x40243009 : 500000): 2.571258 sec Testing armasm2 (0x4005a008 <==> 0x40243009 : 500000): 1.418762 sec The faster routine is armasm2 As you can see, rev2 works a lot better with unaligned buffers. I will update the patch to DirectFB to include this new version of the routine. As for the big-endian, this version will ONLY work with little-endian, so a config directive will need to be set for the build to work on those targets. I will include that in the patch. For now, it would be great if I could get some metrics from people to double check my result. Regards, Vince On Mon, 2009-03-23 at 16:36 +0100, Niels Roest wrote: > Hi Vince, > I'm happy to include the patch, > I just have a few unclarities, hope somebody can clear them.. > > (1) memcpy is speed tested with (I think) aligned accesses (based on > D_MALLOC adresses) but I think we'll see a lot of unaligned memcpy's > too, but that side of the implementation looks kinda weak.. Anyone care > to give some figures for unaligned copy? Have a look at > direct_find_best_memcpy() in lib/direct/memcpy.c, and fidget a bit with > buf1 and buf2. > (2) what happens on a big-endian ARM if I just include the patch? Having > trouble finding this dependancy in the patch.. Will need to fix this, or > put a show stopper somewhere for big-endian, so the patch doesn't break > something. > > Greets > Niels > > vince wrote: > > Hello, > > > > Ive been working on trying to improve the performance of directfb 1.3.0 > > on the arm platform. The attached patch will replace the default libc > > memcpy with a faster implementation. Ive tested this patch using an > > AT91RM9200, but should work on other ARM targets. > > > > Hope this will be useful to others. > > > > Regards, > > > > Vince > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > directfb-dev mailing list > > directfb-dev@directfb.org > > http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev > >
memtest.tar.bz2
Description: application/bzip-compressed-tar
_______________________________________________ directfb-dev mailing list directfb-dev@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev