27-May-2013 01:04, Kiith-Sa пишет:
WRT to the worse Linux64 case: I recommend infinite-cycling it and testing in perf top.
(If you're on Ubuntu/derivative or maybe Debian, just type "perf top", it will tell you what package to install, and once installed, "perf top" again, while the benchmark is running) You'll get a precise real-time line-wise (with ability to drill down to ASM) profile (like "top", but for functions). With some command-line options (google "linux perf"), you can also look at cache misses, branch mispredictions, and so on. Compare that with the original version and you might find why it's slower. (Don't have time to test anything right now)
Just tried it. Now I at least see that in 32bit my version is faster, whereas on 64bit it isn't (that is on DMD). One curiosity is that the code for ASCII case is the same yet even on English text the difference is about the same. Another one is that both function are not even partially inlined.
-- Dmitry Olshansky