Don wrote:
bearophile wrote:
Don:
It'll be interesting to see what the priorities are now -- maybe this
stuff is of more interest now.
Probably removing bugs is more important still :-)
For example your work has changed a little how compile-time functions
can be used in D.
BTW the AMD manual for K7 (or might be K6 optimisation manual? don't
exactly remember) goes into great detail about both memcpy() and
memset(). Turns out there's about five different cases.
In the meantime Deewiant has told me that on 64 bit glibc memset is
better and on more modern CPUs the timings are different (and on 64
bit my first version may not work, maybe the second one is better. I
have not tested it on 64 bit LDC yet). I'm just a newbie on this
stuff, while people that write the memset of 64bit glibc are expert.
Really, memset() _should_ be optimal in all cases. On almost all
compilers, it's not optimal, and on many (such as DMD) there's a _lot_
of room for improvement. So I consider this to a C standard library
implementation issue, rather than a language weakness.
Don, I suggest the following.
std.algorithm has a routine called fill(range, value) which semantically
subsumes memset. I suggest you specialize fill() for contiguous memory
ranges of primitive types (which shouldn't be hard with std.traits), and
then optimize the heck out of it.
You could do the same with copy(), also in std.algorithm, to implement a
super-duper memcpy() routine.
If you go this route people can uniformly use high-level algorithms that
specialize themselves whenever applicable.
Andrei