Don wrote:
bearophile wrote:
Don:

It'll be interesting to see what the priorities are now -- maybe this stuff is of more interest now.

Probably removing bugs is more important still :-)
For example your work has changed a little how compile-time functions can be used in D.


BTW the AMD manual for K7 (or might be K6 optimisation manual? don't exactly remember) goes into great detail about both memcpy() and memset(). Turns out there's about five different cases.

In the meantime Deewiant has told me that on 64 bit glibc memset is better and on more modern CPUs the timings are different (and on 64 bit my first version may not work, maybe the second one is better. I have not tested it on 64 bit LDC yet). I'm just a newbie on this stuff, while people that write the memset of 64bit glibc are expert.

Really, memset() _should_ be optimal in all cases. On almost all compilers, it's not optimal, and on many (such as DMD) there's a _lot_ of room for improvement. So I consider this to a C standard library implementation issue, rather than a language weakness.

Don, I suggest the following.

std.algorithm has a routine called fill(range, value) which semantically subsumes memset. I suggest you specialize fill() for contiguous memory ranges of primitive types (which shouldn't be hard with std.traits), and then optimize the heck out of it.

You could do the same with copy(), also in std.algorithm, to implement a super-duper memcpy() routine.

If you go this route people can uniformly use high-level algorithms that specialize themselves whenever applicable.


Andrei

Reply via email to