Note: the docgen for API is still not ideal, there are internal stuff listed
> nd-arrays (vectors and matrices) of integer, floating number, or complex > values. Yes > Slicing, concatenation, transposing... these sorts of array operations. * [https://mratsim.github.io/Arraymancer/tuto.shapeshifting.html](https://mratsim.github.io/Arraymancer/tuto.shapeshifting.html) * CPU: [https://mratsim.github.io/Arraymancer/shapeshifting.html](https://mratsim.github.io/Arraymancer/shapeshifting.html) * CUDA: [https://mratsim.github.io/Arraymancer/shapeshifting_cuda.html](https://mratsim.github.io/Arraymancer/shapeshifting_cuda.html) > Linear algebra (e.g. matrix multiplication, solving linear equations). Matrix multiplication * CPU: [https://mratsim.github.io/Arraymancer/operators_blas_l2l3.html](https://mratsim.github.io/Arraymancer/operators_blas_l2l3.html) * CUDA: [https://mratsim.github.io/Arraymancer/operators_blas_l2l3_cuda.html](https://mratsim.github.io/Arraymancer/operators_blas_l2l3_cuda.html) * OpenCL: [https://mratsim.github.io/Arraymancer/operators_blas_l2l3_opencl.html](https://mratsim.github.io/Arraymancer/operators_blas_l2l3_opencl.html) Solvers, matrix decomposition, PCA ..., CPU only at the moment * [https://mratsim.github.io/Arraymancer/least_squares.html](https://mratsim.github.io/Arraymancer/least_squares.html) * [https://mratsim.github.io/Arraymancer/linear_systems.html](https://mratsim.github.io/Arraymancer/linear_systems.html) * [https://mratsim.github.io/Arraymancer/decomposition.html](https://mratsim.github.io/Arraymancer/decomposition.html) * [https://mratsim.github.io/Arraymancer/pca.html](https://mratsim.github.io/Arraymancer/pca.html) * [https://mratsim.github.io/Arraymancer/decomposition_rand.html](https://mratsim.github.io/Arraymancer/decomposition_rand.html) > 1D FFT, IFFT Not implemented, wrapping MKL FFT can be a weekend project with c2nim or nimterop [https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/appendix-e-code-examples/fourier-transform-functions-code-examples/fft-code-examples.html](https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-c/top/appendix-e-code-examples/fourier-transform-functions-code-examples/fft-code-examples.html) Implemening a pure Nim FFT is something I want to do at one point but lack of time. > All above, running in CPU only (with MKL and/or automated multi-threading > e.g. for large FFT/IFFT) You can use OpenBLAS or MKL in both Neo or Arraymancer. That said, you can write pure Nim code that has performance similar to both OpenBLAS and MKL. I track benchmarks of pure Nim implementation with threading via Laser (using the Nim OpenMP operators) and Weave here: [https://github.com/mratsim/weave/tree/master/benchmarks/matmul_gemm_blas](https://github.com/mratsim/weave/tree/master/benchmarks/matmul_gemm_blas) iterator `||`[S, T](a: S; b: T; annotation: static string = "parallel for"): T ## See https://nim-lang.org/docs/system.html#%7C%7C.i%2CS%2CT%2Cstring iterator `||`[S, T](a: S; b: T; step: Positive; annotation: static string = "parallel for"): T Run Last time I optimized this, I could reach 2.8 TFlops with Weave, 2.8 TFlops with Laser + OpenMP, 2.7 TFlops on OpenMP, 3 TFlops for MKL and 3.1 TFlops with Intel oneDNN ([https://github.com/mratsim/weave/pull/94#issuecomment-571751545](https://github.com/mratsim/weave/pull/94#issuecomment-571751545)) but i started from single-threaded performance of 160GFlops vs Intel and OpenBLAS 200GFlops on a 18-core machine. > GPU Yes but minimal, Cuda and OpenCL at the moment > Statistical functions PCA and SVD are well developped and actually 2x to 10x faster than in any other language (including Sklearn latest optimizations and Facebook's PCA) * [https://github.com/mratsim/Arraymancer/pull/384#issuecomment-536682906](https://github.com/mratsim/Arraymancer/pull/384#issuecomment-536682906) > SPline, Numerical integration and ODE * [https://github.com/HugoGranstrom/numericalnim](https://github.com/HugoGranstrom/numericalnim)