I actually think it's easier to optimize Nim for scientific computing like I did for the Julia challenge: [https://nextjournal.com/sdanisch/the-julia-challenge](https://nextjournal.com/sdanisch/the-julia-challenge)
For example my [matmul implementation](https://github.com/numforge/laser/blob/bf751f4bbec3d178cd3a80da73e446658d0f8dff/benchmarks/gemm/gemm_bench_float32.nim#L418-L465) would be as fast as Julia Native Threads in [Kostya's matmul benchmark](https://github.com/kostya/benchmarks#matmul) (Julia Native Threads uses the OpenBLAS library written in assembly in backend).
