Questions about Nim and its extensibility

mratsim Sun, 30 Oct 2022 04:45:04 -0700

I have many examples of using Nim to successfully create my own DSL:

  * Creating a slicing syntax DSL to replicate Numpy: 
<https://github.com/mratsim/Arraymancer/blob/276cacd/src/arraymancer/tensor/accessors_macros_syntax.nim#L27-L53>
  * A neural network domain specific language: 
<https://github.com/mratsim/Arraymancer/blob/276cacd/src/arraymancer/nn/nn_dsl.nim#L312-L332>
  * A compiler running in macros for matrices/images/tensor computations: 
<https://github.com/numforge/laser/blob/e23b5d6/laser/lux_compiler/lux_dsl.nim#L43-L58>,
 writeup in mardown files: 
<https://github.com/numforge/laser/tree/e23b5d6/laser/lux_compiler/core>. Note: 
the DSL was designed to also work at runtime, and in the future to be extended 
with LLVM JIT, similar to <https://github.com/can-lehmann/exprgrad>
  * For loop on arbitrary numbers of inputs 
<https://github.com/numforge/laser/blob/e23b5d6/benchmarks/loop_iteration/iter05_fusedpertensor.nim#L156-L159>


See also <https://github.com/mratsim/compute-graph-optim> for progressive 
experimental steps to jump from for loop on arbitrary number of inputs to a 
matrix expression compiler running in macros.

  * Parallel forLoop DSL: 
<https://github.com/mratsim/weave/blob/b76e9ff/benchmarks/matmul_gemm_blas/gemm_pure_nim/gemm_weave.nim#L168-L169>
  * DSL for SIMD: 
<https://github.com/mratsim/weave/blob/b76e9ff/benchmarks/matmul_gemm_blas/gemm_pure_nim/common/gemm_ukernel_avx512.nim#L12-L40>,
 note, my pure Nim code is faster than 
[OpenBLAS](https://github.com/xianyi/OpenBLAS)+OpenMP which is 90% assembly 
(see <https://github.com/mratsim/weave/pull/94>, OpenBLAS sit at 2.7TFlops 
while Laser+Weave is at 2.8TFlops)
  * A DSL for opcode description for a JIT: 
<https://github.com/mratsim/photon-jit/blob/747ae2d/photon_jit/x86_64/x86_64_ops.nim#L24-L51>,
 other JIT libraries use a 2-phase compilation process, either with javascript 
to parse opcodes and generate C++: 
<https://github.com/asmjit/asmjit/blob/8f2c237/tools/tablegen-x86.js> or C++ to 
parse opcodes and generate C++: 
<https://github.com/herumi/xbyak/blob/f8ea5c2/gen/gen_code.cpp>
  * An assembler for bigint and cryptography, for example bigint 
multiplication: 
<https://github.com/mratsim/constantine/blob/928f515/constantine/math/arithmetic/assembly/limbs_asm_mul_x86.nim#L91-L125>.
 The resulting library is between as fast 
(<https://github.com/mratsim/constantine/pull/206>), 15% faster 
(<https://github.com/mratsim/constantine/pull/183>) to 33% faster 
(<https://github.com/mratsim/constantine/pull/207>) than the state of the art, 
either OpenSSL or BLST (written by OpenSSL and Intel crypto-assembly devs), and 
using perlasm for assembly codegen: 
<https://github.com/supranational/blst/blob/6382d67/src/asm/mul_mont_384-armv8.pl>
 and 
<https://github.com/openssl/openssl/blob/d8eb0e1/crypto/sha/asm/sha256-586.pl>

Questions about Nim and its extensibility

Reply via email to