On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
When looking at the assembly I don't like the single-byte
loads. Since string (ubyte[] here) is of extraordinary
importance, it should be worthwhile to use word loads [0]
instead. Really fancy would be SSE.
So far, the results look disappointing. Andrei find does not get
faster with wordwise matching:
./benchmark.ldc
std find: 133 ±25 +38 (3384) -19 (6486)
manual find: 140 ±37 +64 (2953) -25 (6962)
qznc find: 114 ±17 +33 (2610) -11 (7262)
Chris find: 146 ±39 +66 (3045) -28 (6873)
Andrei find: 126 ±29 +54 (2720) -19 (7189)
Wordwise find: 130 ±30 +53 (2934) -21 (6980)
Interesting side-note: On my laptop Andrei find is faster than
qznc find (for LDC), but on my desktop it reverses (see above).
Both are Intel i7. Need to find a simpler processor. Maybe
wordwise is faster there. Alternatively, find is purely memory
bound and the L1 cache makes every difference disappear.
Also, note how std find is faster than manual find! Finding a
reliable benchmark is hard. :/