You're welcome to join the SciNim chat: 
[https://gitter.im/SciNim/community](https://gitter.im/SciNim/community)

Regarding embedded and metaprogramming example, you might be interested in my 
[Synthesis](https://github.com/mratsim/synthesis) repo. It's a state machine 
generator implemented as a custom DSL with Nim and Graphviz backend. It's very 
high performance, you probably can't beat it with pure C (no allocation at all, 
no indirect dispatch via tables or switch, the generated code is pure 
goto-based and avoids branch misprediction due to having a single dispatch 
point that confuses the hardware predictors).

Regarding science, you probably came across 
[Arraymancer](https://github.com/mratsim/Arraymancer) and 
[ggplotnim](https://github.com/Vindaar/ggplotnim)

Be sure to check the Are we scientist 
yet?<https://github.com/nim-lang/needed-libraries/issues/77>_ thread.

And if you want to see an example of metaprogramming of Nim vs Julia, you can 
check [my 
submission](https://github.com/SimonDanisch/julia-challenge/blob/master/nim/nim_sol_mratsim.nim)
 to the [Julia metaprogramming 
challenge](https://nextjournal.com/sdanisch/the-julia-challenge)

I.e. in 200 lines of code, you have a multidimensional array/tensor type with 
supports for any number of dimensions, broadcasting (the julia dot operator) 
and iterations on a variadic number of tensors.

I've also made the code about 40% faster when [iterating on strided tensors 
resulting from slices in 
Laser.](https://github.com/numforge/laser#loop-fusion-and-strided-iterators-for-matrix-and-tensors)

Depending on your embedded devices, you might also want to develop an 
assembler, Nim macros makes it possible to create a [DSL to map the 
instructions](https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/laser/photon_jit/x86_64/x86_64_ops.nim),
 for example for x86:
    
    
    # Notes:
    #   - The imm64 version will generate a proc for uint64 and int64
    #     and another one for pointers immediate
    #   - The dst64, imm32 version will generate a proc for uint32 and int32
    #     and a proc for int literals (known at compile-time)
    #     that will call proc(reg, imm32) if the int is small enough.
    #     ---> (dst64, imm64) should be defined before (dst64, imm32)
    
    op_generator:
      op MOV: # MOV(dst, src) load/copy src into destination
        ## Copy 64-bit register content to another register
        [dst64, src64]: [rex(w=1), 0x89, modrm(Direct, reg = src64, rm = dst64)]
        ## Copy 32-bit register content to another register
        [dst32, src32]: [          0x89, modrm(Direct, reg = src32, rm = dst32)]
        ## Copy 16-bit register content to another register
        [dst16, src16]: [    0x66, 0x89, modrm(Direct, reg = src16, rm = dst16)]
        
        ## Copy  8-bit register content to another register
        [dst8,  src8]:  [          0x88, modrm(Direct, reg = src8, rm = dst8)]
        
        ## Copy 64-bit immediate value into register
        [dst64, imm64]: [rex(w=1), 0xB8 + dst64] & imm64
        ## Copy 32-bit immediate value into register
        [dst64, imm32]: [          0xB8 + dst64] & imm32
        ## Copy 16-bit immediate value into register
        [dst64, imm16]: [    0x66, 0xB8 + dst64] & imm16
        
        ## Copy 32-bit immediate value into register
        [dst32, imm32]: [          0xB8 + dst32] & imm32
        ## Copy 16-bit immediate value into register
        [dst32, imm16]: [    0x66, 0xB8 + dst32] & imm16
        
        ## Copy 16-bit immediate value into register
        [dst16, imm16]: [    0x66, 0xB8 + dst16] & imm16
        ## Copy  8-bit immediate value into register
        [dst8,  imm8]:  [          0xB0 + dst8, imm8]
      
      op LEA:
        ## Load effective address of the target label into a register
        [dst64, label]: [rex(w=1), 0x8D, modrm(Direct, reg = dst64, rm = rbp)]
      
      op CMP:
        ## Compare 32-bit immediate with 32-bit int at memory location stored 
in adr register
        [adr, imm64]: [ rex(w=1), 0x81, modrm(Indirect, opcode_ext = 7, rm = 
adr[0])] & imm64
        ## Compare 32-bit immediate with 32-bit int at memory location stored 
in adr register
        [adr, imm32]: [           0x81, modrm(Indirect, opcode_ext = 7, rm = 
adr[0])] & imm32
        ## Compare 16-bit immediate with 16-bit int at memory location stored 
in adr register
        [adr, imm16]: [     0x66, 0x81, modrm(Indirect, opcode_ext = 7, rm = 
adr[0])] & imm16
        ## Compare 8-bit immediate with byte at memory location stored in adr 
register
        [adr, imm8]:  [           0x80, modrm(Indirect, opcode_ext = 7, rm = 
adr[0]), imm8]
      
      op JZ:
        ## Jump to label if zero flag is set
        [label]: [0x0F, 0x84]
      op JNZ:
        ## Jump to label if zero flag is not set
        [label]: [0x0F, 0x85]
      
      op INC:
        ## Increment register by 1. Carry flag is never updated.
        [dst64]: [rex(w=1), 0xFF, modrm(Direct, opcode_ext = 0, rm = dst64)]
        [dst32]: [          0xFF, modrm(Direct, opcode_ext = 0, rm = dst32)]
        [dst16]: [    0x66, 0xFF, modrm(Direct, opcode_ext = 0, rm = dst16)]
        [dst8]:  [          0xFE, modrm(Direct, opcode_ext = 0, rm = dst8)]
        ## Increment data at the address by 1. Data type must be specified.
        [adr, type(64)]: [rex(w=1), 0xFF, modrm(Indirect, opcode_ext = 0, rm = 
adr[0])]
        [adr, type(32)]: [          0xFF, modrm(Indirect, opcode_ext = 0, rm = 
adr[0])]
        [adr, type(16)]: [    0x66, 0xFF, modrm(Indirect, opcode_ext = 0, rm = 
adr[0])]
        [adr, type(8)]:  [0xFE, modrm(Indirect, opcode_ext = 0, rm = adr[0])]
      
      op DEC:
        ## Increment register by 1. Carry flag is never updated.
        [dst64]: [rex(w=1), 0xFF, modrm(Direct, opcode_ext = 1, rm = dst64)]
        [dst32]: [          0xFF, modrm(Direct, opcode_ext = 1, rm = dst32)]
        [dst16]: [    0x66, 0xFF, modrm(Direct, opcode_ext = 1, rm = dst16)]
        [dst8]:  [          0xFE, modrm(Direct, opcode_ext = 1, rm = dst8)]
        ## Increment data at the address by 1. Data type must be specified.
        [adr, type(64)]: [rex(w=1), 0xFF, modrm(Indirect, opcode_ext = 1, rm = 
adr[0])]
        [adr, type(32)]: [          0xFF, modrm(Indirect, opcode_ext = 1, rm = 
adr[0])]
        [adr, type(16)]: [    0x66, 0xFF, modrm(Indirect, opcode_ext = 1, rm = 
adr[0])]
        [adr, type(8)]:  [0xFE, modrm(Indirect, opcode_ext = 1, rm = adr[0])]
    
    
    Run

And usage for a [brainfuck JIT 
assembler](https://github.com/numforge/laser/blob/d1e6ae6106564bfb350d4e566261df97dbb578b3/examples/ex07_jit_brainfuck_vm.nim#L62-L84)
 (complete with clobbered registers cleanup):
    
    
    while not stream.atEnd():
          case stream.readChar()
          of '>': a.inc rbx          # Pointer increment
          of '<': a.dec rbx          # Pointer decrement
          of '+': a.inc [rbx], uint8 # Memory increment
          of '-': a.dec [rbx], uint8 # Memory decrement
          of '.': a.os_write()       # Print
          of ',': a.os_read()        # Read from stdin
          of '[':                    # If mem == 0, Skip block to corresponding 
']'
            let
              loop_start = initLabel()
              loop_end   = initLabel()
            a.cmp [rbx], uint8 0
            a.jz loop_end
            a.label loop_start
            stack.add (loop_start, loop_end)
          of ']':
            let (loop_start, loop_end) = stack.pop()
            a.cmp [rbx], uint8 0
            a.jnz loop_start
            a.label loop_end
          else:
            discard
    
    
    Run

I have plenty of other metaprogramming examples:

  * Neural network DSL
  * Simulating classes with ADTs
  * Creating a compiler for linear algebra and deep-learning
  * Creating matrix multiplication kernels as fast or faster than (pure 
assembly) OpenBLAS
  * Recreating the OpenMP syntax for multithreading
  * Implementing Einstein Summation
  * ...



so ask away

Reply via email to