On 05/30/2016 05:31 AM, qznc wrote:
On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
When looking at the assembly I don't like the single-byte loads. Since
string (ubyte[] here) is of extraordinary importance, it should be
worthwhile to use word loads [0] instead. Really fancy would be SSE.

So far, the results look disappointing. Andrei find does not get faster
with wordwise matching:

./benchmark.ldc
     std find: 133 ±25    +38 (3384)  -19 (6486)
  manual find: 140 ±37    +64 (2953)  -25 (6962)
    qznc find: 114 ±17    +33 (2610)  -11 (7262)
   Chris find: 146 ±39    +66 (3045)  -28 (6873)
  Andrei find: 126 ±29    +54 (2720)  -19 (7189)
Wordwise find: 130 ±30    +53 (2934)  -21 (6980)

Interesting side-note: On my laptop Andrei find is faster than qznc find
(for LDC), but on my desktop it reverses (see above). Both are Intel i7.
Need to find a simpler processor. Maybe wordwise is faster there.
Alternatively, find is purely memory bound and the L1 cache makes every
difference disappear.

Also, note how std find is faster than manual find! Finding a reliable
benchmark is hard. :/

Please throw this hat into the ring as well: it should improve average search on large vocabulary dramatically.

https://dpaste.dzfl.pl/dc8dc6e1eb53

It uses a BM-inspired trick - once the last characters matched, if the match subsequently fails it needn't start from the next character in the haystack. The "skip" is computed lazily and in a separate function so as to keep the loop tight. All in all a routine worth a look. I wanted to write this for a long time. -- Andrei

Reply via email to