WTF ?! basic computations are faster in py3 than nim1.6 ?!

shirleyquirk Sun, 24 Oct 2021 09:00:12 -0700

It's actually a great example for how to write fast Nim code, there are a few 
pitfalls lurking about.


First of all you found the compiler flags, one can get very far with 
`-d:release` and then `-d:danger` but as you say, the algorithm is the most 
important.

To start with, we can add an accessor template to make some of the grid math 
clearer, and create a `Sudoku` type (that's really just a string)

You've written in a functional style, with many string copies, which is great 
for robustness, but I'm going for speed so the first thing I tried was 
replacing all the procs with `iterator():char`

Here we run into a problem, how do you pass the new `square` iterator into 
`freeset`? We could use closure iterators but for speed we really want to use 
inline.

the answer is to turn `freeset` into a template. with that done, the speed 
almost doubles.

but more is possible: the recursive function `resolv` still makes copies, but 
with a few changes those can be done away with and we're left with everything 
operating on a single string.

At that point, things are so fast that the speed of `find` starts to make a 
difference, and I have found that `strchr`, when coupled with 
`--passC:-march=native` will use SIMD vectorization and can be faster still. 
For me the difference is between about 160ms and 150ms.

and here it is after all that:

<https://play.nim-lang.org/#ix=3CFR>

I compile with `nim c -d:danger` but you can also try adding 
`--passC:-march=native -d:useStrChr` for an extra 5%

WTF ?! basic computations are faster in py3 than nim1.6 ?!

Reply via email to