I got a 1.76x speed up with PGO on gcc-10.2 Linux 4.7GHz Skylake (default GC): 120 ms -d:release 86 ms -d:danger 49 ms PGO Run
The full range of perf (120/49=2.45x) is comparable to @jrfondren's 198/78=2.54x. So, I suspect clang PGO would be similar (I do not have a script set up for that, but [see here](https://forum.nim-lang.org/t/6295)). @apardes reported a full 4.0x ratio. So, it's possible there is still 1.6x to be explained and/or some nim-level optimization that could be done (also possible diff compilation covers the gap for him). `vdivsd` showed up at the top of a quick profile for me. Multiplying by the reciprocal may be faster than dividing in `proc /(v: Vector, c: float64)`. Maybe Rust is smart enough to do that here? Or some other small micro-optimization type work?