On 05/30/2016 05:31 AM, qznc wrote:
On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
When looking at the assembly I don't like the single-byte loads. Since
string (ubyte[] here) is of extraordinary importance, it should be
worthwhile to use word loads [0] instead. Really fancy would be SSE.
So far, the results look disappointing. Andrei find does not get faster
with wordwise matching:
./benchmark.ldc
std find: 133 ±25 +38 (3384) -19 (6486)
manual find: 140 ±37 +64 (2953) -25 (6962)
qznc find: 114 ±17 +33 (2610) -11 (7262)
Chris find: 146 ±39 +66 (3045) -28 (6873)
Andrei find: 126 ±29 +54 (2720) -19 (7189)
Wordwise find: 130 ±30 +53 (2934) -21 (6980)
Interesting side-note: On my laptop Andrei find is faster than qznc find
(for LDC), but on my desktop it reverses (see above). Both are Intel i7.
Need to find a simpler processor. Maybe wordwise is faster there.
Alternatively, find is purely memory bound and the L1 cache makes every
difference disappear.
Also, note how std find is faster than manual find! Finding a reliable
benchmark is hard. :/
Please throw this hat into the ring as well: it should improve average
search on large vocabulary dramatically.
https://dpaste.dzfl.pl/dc8dc6e1eb53
It uses a BM-inspired trick - once the last characters matched, if the
match subsequently fails it needn't start from the next character in the
haystack. The "skip" is computed lazily and in a separate function so as
to keep the loop tight. All in all a routine worth a look. I wanted to
write this for a long time. -- Andrei