On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
When looking at the assembly I don't like the single-byte loads. Since string (ubyte[] here) is of extraordinary importance, it should be worthwhile to use word loads [0] instead. Really fancy would be SSE.

So far, the results look disappointing. Andrei find does not get faster with wordwise matching:

./benchmark.ldc
    std find: 133 ±25    +38 (3384)  -19 (6486)
 manual find: 140 ±37    +64 (2953)  -25 (6962)
   qznc find: 114 ±17    +33 (2610)  -11 (7262)
  Chris find: 146 ±39    +66 (3045)  -28 (6873)
 Andrei find: 126 ±29    +54 (2720)  -19 (7189)
Wordwise find: 130 ±30    +53 (2934)  -21 (6980)

Interesting side-note: On my laptop Andrei find is faster than qznc find (for LDC), but on my desktop it reverses (see above). Both are Intel i7. Need to find a simpler processor. Maybe wordwise is faster there. Alternatively, find is purely memory bound and the L1 cache makes every difference disappear.

Also, note how std find is faster than manual find! Finding a reliable benchmark is hard. :/

Reply via email to