Yes, your guess that missing inlining is the problem seems to be true.
We generally use link time optimization, which is really good with gcc10. Try
$ nim c -d:release --passC:-flto t.nim
$ ./t
julia1: 114 ms (sum of pixels: 27677748)
julia2: 115 ms (sum of pixels: 27677748)
julia3: 111 ms (sum of pixels: 27677748)
RunThere are more options to tweak of course, like ARC or march=native and such.
