You're welcome to join the SciNim chat: [https://gitter.im/SciNim/community](https://gitter.im/SciNim/community)
Regarding embedded and metaprogramming example, you might be interested in my [Synthesis](https://github.com/mratsim/synthesis) repo. It's a state machine generator implemented as a custom DSL with Nim and Graphviz backend. It's very high performance, you probably can't beat it with pure C (no allocation at all, no indirect dispatch via tables or switch, the generated code is pure goto-based and avoids branch misprediction due to having a single dispatch point that confuses the hardware predictors). Regarding science, you probably came across [Arraymancer](https://github.com/mratsim/Arraymancer) and [ggplotnim](https://github.com/Vindaar/ggplotnim) Be sure to check the Are we scientist yet?<https://github.com/nim-lang/needed-libraries/issues/77>_ thread. And if you want to see an example of metaprogramming of Nim vs Julia, you can check [my submission](https://github.com/SimonDanisch/julia-challenge/blob/master/nim/nim_sol_mratsim.nim) to the [Julia metaprogramming challenge](https://nextjournal.com/sdanisch/the-julia-challenge) I.e. in 200 lines of code, you have a multidimensional array/tensor type with supports for any number of dimensions, broadcasting (the julia dot operator) and iterations on a variadic number of tensors. I've also made the code about 40% faster when [iterating on strided tensors resulting from slices in Laser.](https://github.com/numforge/laser#loop-fusion-and-strided-iterators-for-matrix-and-tensors) Depending on your embedded devices, you might also want to develop an assembler, Nim macros makes it possible to create a [DSL to map the instructions](https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/laser/photon_jit/x86_64/x86_64_ops.nim), for example for x86: # Notes: # - The imm64 version will generate a proc for uint64 and int64 # and another one for pointers immediate # - The dst64, imm32 version will generate a proc for uint32 and int32 # and a proc for int literals (known at compile-time) # that will call proc(reg, imm32) if the int is small enough. # ---> (dst64, imm64) should be defined before (dst64, imm32) op_generator: op MOV: # MOV(dst, src) load/copy src into destination ## Copy 64-bit register content to another register [dst64, src64]: [rex(w=1), 0x89, modrm(Direct, reg = src64, rm = dst64)] ## Copy 32-bit register content to another register [dst32, src32]: [ 0x89, modrm(Direct, reg = src32, rm = dst32)] ## Copy 16-bit register content to another register [dst16, src16]: [ 0x66, 0x89, modrm(Direct, reg = src16, rm = dst16)] ## Copy 8-bit register content to another register [dst8, src8]: [ 0x88, modrm(Direct, reg = src8, rm = dst8)] ## Copy 64-bit immediate value into register [dst64, imm64]: [rex(w=1), 0xB8 + dst64] & imm64 ## Copy 32-bit immediate value into register [dst64, imm32]: [ 0xB8 + dst64] & imm32 ## Copy 16-bit immediate value into register [dst64, imm16]: [ 0x66, 0xB8 + dst64] & imm16 ## Copy 32-bit immediate value into register [dst32, imm32]: [ 0xB8 + dst32] & imm32 ## Copy 16-bit immediate value into register [dst32, imm16]: [ 0x66, 0xB8 + dst32] & imm16 ## Copy 16-bit immediate value into register [dst16, imm16]: [ 0x66, 0xB8 + dst16] & imm16 ## Copy 8-bit immediate value into register [dst8, imm8]: [ 0xB0 + dst8, imm8] op LEA: ## Load effective address of the target label into a register [dst64, label]: [rex(w=1), 0x8D, modrm(Direct, reg = dst64, rm = rbp)] op CMP: ## Compare 32-bit immediate with 32-bit int at memory location stored in adr register [adr, imm64]: [ rex(w=1), 0x81, modrm(Indirect, opcode_ext = 7, rm = adr[0])] & imm64 ## Compare 32-bit immediate with 32-bit int at memory location stored in adr register [adr, imm32]: [ 0x81, modrm(Indirect, opcode_ext = 7, rm = adr[0])] & imm32 ## Compare 16-bit immediate with 16-bit int at memory location stored in adr register [adr, imm16]: [ 0x66, 0x81, modrm(Indirect, opcode_ext = 7, rm = adr[0])] & imm16 ## Compare 8-bit immediate with byte at memory location stored in adr register [adr, imm8]: [ 0x80, modrm(Indirect, opcode_ext = 7, rm = adr[0]), imm8] op JZ: ## Jump to label if zero flag is set [label]: [0x0F, 0x84] op JNZ: ## Jump to label if zero flag is not set [label]: [0x0F, 0x85] op INC: ## Increment register by 1. Carry flag is never updated. [dst64]: [rex(w=1), 0xFF, modrm(Direct, opcode_ext = 0, rm = dst64)] [dst32]: [ 0xFF, modrm(Direct, opcode_ext = 0, rm = dst32)] [dst16]: [ 0x66, 0xFF, modrm(Direct, opcode_ext = 0, rm = dst16)] [dst8]: [ 0xFE, modrm(Direct, opcode_ext = 0, rm = dst8)] ## Increment data at the address by 1. Data type must be specified. [adr, type(64)]: [rex(w=1), 0xFF, modrm(Indirect, opcode_ext = 0, rm = adr[0])] [adr, type(32)]: [ 0xFF, modrm(Indirect, opcode_ext = 0, rm = adr[0])] [adr, type(16)]: [ 0x66, 0xFF, modrm(Indirect, opcode_ext = 0, rm = adr[0])] [adr, type(8)]: [0xFE, modrm(Indirect, opcode_ext = 0, rm = adr[0])] op DEC: ## Increment register by 1. Carry flag is never updated. [dst64]: [rex(w=1), 0xFF, modrm(Direct, opcode_ext = 1, rm = dst64)] [dst32]: [ 0xFF, modrm(Direct, opcode_ext = 1, rm = dst32)] [dst16]: [ 0x66, 0xFF, modrm(Direct, opcode_ext = 1, rm = dst16)] [dst8]: [ 0xFE, modrm(Direct, opcode_ext = 1, rm = dst8)] ## Increment data at the address by 1. Data type must be specified. [adr, type(64)]: [rex(w=1), 0xFF, modrm(Indirect, opcode_ext = 1, rm = adr[0])] [adr, type(32)]: [ 0xFF, modrm(Indirect, opcode_ext = 1, rm = adr[0])] [adr, type(16)]: [ 0x66, 0xFF, modrm(Indirect, opcode_ext = 1, rm = adr[0])] [adr, type(8)]: [0xFE, modrm(Indirect, opcode_ext = 1, rm = adr[0])] Run And usage for a [brainfuck JIT assembler](https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/examples/ex07_jit_brainfuck_vm.nim#L62-L84) (complete with clobbered registers cleanup): while not stream.atEnd(): case stream.readChar() of '>': a.inc rbx # Pointer increment of '<': a.dec rbx # Pointer decrement of '+': a.inc [rbx], uint8 # Memory increment of '-': a.dec [rbx], uint8 # Memory decrement of '.': a.os_write() # Print of ',': a.os_read() # Read from stdin of '[': # If mem == 0, Skip block to corresponding ']' let loop_start = initLabel() loop_end = initLabel() a.cmp [rbx], uint8 0 a.jz loop_end a.label loop_start stack.add (loop_start, loop_end) of ']': let (loop_start, loop_end) = stack.pop() a.cmp [rbx], uint8 0 a.jnz loop_start a.label loop_end else: discard Run I have plenty of other metaprogramming examples: * Neural network DSL * Simulating classes with ADTs * Creating a compiler for linear algebra and deep-learning * Creating matrix multiplication kernels as fast or faster than (pure assembly) OpenBLAS * Recreating the OpenMP syntax for multithreading * Implementing Einstein Summation * ... so ask away