"foreach(i, dchar c; s)" vs "decode"

monarch_dodra Sun, 25 Nov 2012 13:40:28 -0800

I spent *all* week benchmarking a string processing function. Andnow, at the end of the week, I can safely say that the compiler's"foreach" is slower than a phobos decode based while loop.


Basically, given a
----
foreach(i, dchar c; s)
{codeCodeCode;}
----
 loop, I replaced it with:
----
{
    size_t i;
    size_t j;
    immutable k = s.length;
    dchar c;
    for ( ; i < k ; i = j )
    {
        c = decode(s, j);
        codeCodeCode;
    }
}
----

And my algorithms instantly gained a 10-25% performanceimprovement(!). I benched using varied sources of data, inparticular, both ASCII only strings, as well as unicode heavytext.

Unicode has better gains, but raw ASCII text is *also* has gains:/

this holds true for both UTF-8 and UTF-16.

UTF-32 is different, because foreach has the "unfair" advantageof not validating the code points...

I got these results on 2.061 alpha release, with phobos inrelease and both -inline and without inline.

So if any of the compiler guys are reading this... I have no ideahow the unicode foreach is actually implemented, but there*should* be substantial gains to be had...

"foreach(i, dchar c; s)" vs "decode"

Reply via email to