Tomas Frydrych wrote:

Like Dirk already replied, the implementation is in macros in the .h
file.

I see. That makes the comparison with memcpy somewhat unfair, since you
are not actually providing replacement functions, so this would only
make difference for -O3 type optimatisation (where you trade speed for
size); it would be interesting to see what the performance difference is
if you add the C prologue and epilogue.#

Memory bandwidth benchmarking is done on 2MB memory block, so prologue
and epilogue code does not introduce any noticeable difference.

I did not pay much attention on optimizing prologue/epilogue code yet,
it should make difference on smaller buffer sizes, but it is in a TODO
list.

BTW, you can instruct gcc to use inlined assembler version of its memcpy
and friends as well, I think -O3 includes this, but if I read
bits/string.h correctly in my sbox, there are no such inlined functions
on the arm though, so there is certainly value in doing this.

Well you got the source, so you can do your own benchmarks either with
-O2 or -O3 or even -O9 and post them here :)


_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to