since it is basically me who took over the nim_glm maintenance. The version you fixed is very old and not the version anymore that I maintain currently anymore. I should mention that I never did performance benchmarks on that library, only correctness tests. I only applied my knowledge of what is necessary to potentionally get good performance I never actually enshured the performance was optimal. But it's good to see, that it is not horrible.
One thing that I haven't seen you talk about is alignment. Have you tried to use the Vec4 type in c++ glm instead of the Vec3 type? To my knowledge only a self aligned vec4 or vec2 type can be fully optimized to use SIMD instructions. That could give c++ quite some performance that the Nim version doesn't have.
