On 10.08.2018 18:04, Wilco Dijkstra wrote:

A quick benchmark shows it's faster up to about 10 bytes, but after that it 
becomes extremely slow. At 16 bytes it's already 2.5 times slower and for 
larger sizes its over 13 times slower than the GLIBC implementation...

The implementation falls back to the library call if the
string is not aligned.

If it did that for larger sizes then it would be fine. However a byte loop is 
is unacceptably slow.

Also given the large amount of inlined code, it would make sense to handle 
larger sizes than 8. It may be worth comparing a loop doing 8 bytes per 
iteration with the GLIBC strlen or just inline the first 16 bytes and then 
fallback to strlen.
Also if you have statistics that show tiny strlen sizes are much more common 
then the strlen implementation could be further tuned for that.
Valid points, thanks. I will consider that.
BTW, what HW did you use for the benchmarking?

Reply via email to