On 15/03/15 08:33, Kristoffer Brånemyr wrote: > Hi, > > I did some tests and found out you can actually beat memchr with a simple > loop. Tests were done on a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB file > that was already cached in memory. Benchmarking was done simply with the > 'time' command. I don't know how this code would run on other architectures, > but I guess you could put it in an #ifdef? > > Coreutils 2.83 version, compiled with -O3: > 507755520 /home/ztion/words > > real 0m3.126s > user 0m2.699s > sys 0m0.429s > > > Improved version compiled with -O2: > 507755520 /home/ztion/words > > real 0m2.857s > user 0m2.461s > sys 0m0.396s > > Improved version compiled with -O3: > 507755520 /home/ztion/words > > real 0m1.518s > user 0m1.157s > sys 0m0.361s > > I studied the generated assembly and with -O3 gcc generates some fancy SSE > code, getting some nice speedups. memchr is also SSE optimized as far as I > know, so it's interesting that this is so much faster, twice as fast actually. > > In case you don't like turning -O3 on for some reason (the default in > coreutils is -O2 i think), the best version I could put together for -O2 was > this: > > Improved version 2, compiled with -O2: > 507755520 /home/ztion/words > > real 0m2.206s > user 0m1.827s > sys 0m0.379s
Interesting. Thanks for the results. I use 'gcc -march=native -g -O3' locally, and with that can't see a difference in performance. What version of glibc and gcc are you using? gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here. thanks, Pádraig.
