On 15/03/15 21:14, Kristoffer Brånemyr wrote: > > > > >>Den söndag, 15 mars 2015 20:13 skrev Pádraig Brady <[email protected]>: >> >> >>>On 15/03/15 08:33, Kristoffer Brånemyr wrote: >>> >>> Hi, >>> >>> I did some tests and found out you can actually beat memchr with a simple >>> loop. Tests were done on >>a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB file >>> that was already cached in memory. >>Benchmarking >was done simply with the >>> 'time' command. I don't know how this code would run on >>other >>> >architectures, but I guess you could put it in an #ifdef? >>> >>> Coreutils 2.83 version, compiled with -O3: >>> 507755520 /home/ztion/words >>> >>> real 0m3.126s >>> user 0m2.699s >>> sys 0m0.429s >>> >>> >>> Improved version compiled with -O2: >>> 507755520 /home/ztion/words >>> >>> real 0m2.857s >>> user 0m2.461s >>> sys 0m0.396s >>> >>> Improved version compiled with -O3: >>> 507755520 /home/ztion/words >>> >>> real 0m1.518s >>> user 0m1.157s >>> sys 0m0.361s >>> >>> I studied the generated assembly and with -O3 gcc generates some fancy SSE >>> code, getting some nice speedups. memchr is also SSE optimized as far as I >>> know, so it's interesting that this is so much faster, twice as fast >>> actually. >>> >>> In case you don't like turning -O3 on for some reason (the default in >>> coreutils is -O2 i think), the best version I could put together for -O2 >>> was this: >>> >>> Improved version 2, compiled with -O2: >>> 507755520 /home/ztion/words >>> >>> real 0m2.206s >>> user 0m1.827s >>> sys 0m0.379s > > >>Interesting. Thanks for the results. >>I use 'gcc -march=native -g -O3' locally, and with that can't see a >>difference in performance. >> >>What version of glibc and gcc are you using? >>gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here. >> >>thanks, >>Pádraig. > > > Hi, > > This is with gcc 4.9.2-7 and glibc 2.19-17 on Debian amd64. The difference is > still there for me when compiling with your CFLAGS. Have they improved memchr > in glibc 2.20? I don't think they have that yet in debian unfortunately. > > What cpu do you have?
i3-2310M I was doing a very quick test with _short_ lines Specifically /usr/share/dict/words Note GCC should be using builtin_memchr here so not hitting the function call overhead. I'll look in more detail later. thanks, Pádraig.
