I have many examples of using Nim to successfully create my own DSL: * Creating a slicing syntax DSL to replicate Numpy: <https://github.com/mratsim/Arraymancer/blob/276cacd/src/arraymancer/tensor/accessors_macros_syntax.nim#L27-L53> * A neural network domain specific language: <https://github.com/mratsim/Arraymancer/blob/276cacd/src/arraymancer/nn/nn_dsl.nim#L312-L332> * A compiler running in macros for matrices/images/tensor computations: <https://github.com/numforge/laser/blob/e23b5d6/laser/lux_compiler/lux_dsl.nim#L43-L58>, writeup in mardown files: <https://github.com/numforge/laser/tree/e23b5d6/laser/lux_compiler/core>. Note: the DSL was designed to also work at runtime, and in the future to be extended with LLVM JIT, similar to <https://github.com/can-lehmann/exprgrad> * For loop on arbitrary numbers of inputs <https://github.com/numforge/laser/blob/e23b5d6/benchmarks/loop_iteration/iter05_fusedpertensor.nim#L156-L159>
See also <https://github.com/mratsim/compute-graph-optim> for progressive experimental steps to jump from for loop on arbitrary number of inputs to a matrix expression compiler running in macros. * Parallel forLoop DSL: <https://github.com/mratsim/weave/blob/b76e9ff/benchmarks/matmul_gemm_blas/gemm_pure_nim/gemm_weave.nim#L168-L169> * DSL for SIMD: <https://github.com/mratsim/weave/blob/b76e9ff/benchmarks/matmul_gemm_blas/gemm_pure_nim/common/gemm_ukernel_avx512.nim#L12-L40>, note, my pure Nim code is faster than [OpenBLAS](https://github.com/xianyi/OpenBLAS)+OpenMP which is 90% assembly (see <https://github.com/mratsim/weave/pull/94>, OpenBLAS sit at 2.7TFlops while Laser+Weave is at 2.8TFlops) * A DSL for opcode description for a JIT: <https://github.com/mratsim/photon-jit/blob/747ae2d/photon_jit/x86_64/x86_64_ops.nim#L24-L51>, other JIT libraries use a 2-phase compilation process, either with javascript to parse opcodes and generate C++: <https://github.com/asmjit/asmjit/blob/8f2c237/tools/tablegen-x86.js> or C++ to parse opcodes and generate C++: <https://github.com/herumi/xbyak/blob/f8ea5c2/gen/gen_code.cpp> * An assembler for bigint and cryptography, for example bigint multiplication: <https://github.com/mratsim/constantine/blob/928f515/constantine/math/arithmetic/assembly/limbs_asm_mul_x86.nim#L91-L125>. The resulting library is between as fast (<https://github.com/mratsim/constantine/pull/206>), 15% faster (<https://github.com/mratsim/constantine/pull/183>) to 33% faster (<https://github.com/mratsim/constantine/pull/207>) than the state of the art, either OpenSSL or BLST (written by OpenSSL and Intel crypto-assembly devs), and using perlasm for assembly codegen: <https://github.com/supranational/blst/blob/6382d67/src/asm/mul_mont_384-armv8.pl> and <https://github.com/openssl/openssl/blob/d8eb0e1/crypto/sha/asm/sha256-586.pl>
