Hi Jakob, while LLVM is surely an interesting project, I think it would by far be overkill for what we need here.
What I'm doing is extremely simple. I defined a (mostly) single-address assembly language, with a rather basic set of instructions which I believe can easily be mapped to differing architectures. At least, there are no assumptions about arity, register sets or instruction formats. And, most important, it is quite readable (as opposed the most current assembly languages (including LLVM)! Though I think it is still too early to publish yet, let me show you example from the current sources ('car' and 'if'). # (c....r 'lst) -> any (code 'doCarE_E 2) push X ldX E ldE (E CDR) # Evaluate CADR ldE (E) eval num? E # Check list jnz lstErrEX ldE (E) # Take CAR pop X ret On x86-64 this expands to: .balign 16 nop nop .global doCarE_E doCarE_E: pushq %r13 movq %rbx, %r13 movq 8(%rbx), %rbx movq (%rbx), %rbx test $0x06, %bl jnz 2f test $0x08, %bl jnz 1f call evListE_E jmp 2f 1: movq (%rbx), %rbx 2: testb $0x06, %bl jnz lstErrEX movq (%rbx), %rbx popq %r13 ret 'if' is a little longer: # (if 'any1 'any2 . prg) -> any (code 'doIfE_E 2) ldE (E CDR) # Body push (E CDR) # Push rest ldE (E) # Get condition eval # Eval condition nil? E if ne # Non-NIL stE (AtSym) pop E # Get rest ldE (E) # Consequent eval/ret end xchX (S) # Get rest in X ldX (X CDR) # Else do ldE (X) ldX (X CDR) eval atom? X # Atom? until nz # Yes pop X ret with this on x86-64: .balign 16 nop nop .global doIfE_E doIfE_E: movq 8(%rbx), %rbx pushq 8(%rbx) movq (%rbx), %rbx test $0x06, %bl jnz 2f test $0x08, %bl jnz 1f call evListE_E jmp 2f 1: movq (%rbx), %rbx 2: cmpq $Nil, %rbx jz .125 movq %rbx, AtSym popq %rbx movq (%rbx), %rbx test $0x06, %bl jnz ret test $0x08, %bl jz evListE_E movq (%rbx), %rbx ret .125: xchg %r13, (%rsp) movq 8(%r13), %r13 .126: movq (%r13), %rbx movq 8(%r13), %r13 test $0x06, %bl jnz 2f test $0x08, %bl jnz 1f call evListE_E jmp 2f 1: movq (%rbx), %rbx 2: testb $0x0E, %r13b jz .126 popq %r13 ret The mapping of the individual instructions is rather straightforward. On the x86 architecture, most of them expand to a single target instruction. The machine register set is defined as: +---+---+---+---+---+---+---+---+ | A | B | \ [A]ccumulator +---+---+---+---+---+---+---+---+ D [B]yte register | C | / [C]ount register +---+---+---+---+---+---+---+---+ [D]ouble register | E | [E]xpression register +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | X | [X] Index register +---+---+---+---+---+---+---+---+ [Y] Index register | Y | [Z] Index register +---+---+---+---+---+---+---+---+ | Z | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | L | [L]ink register +---+---+---+---+---+---+---+---+ [S]tack pointer | S | +---+---+---+---+---+---+---+---+ +-------------------------------+ | [z]ero [s]ign [c]arry | [F]lags +-------------------------------+ Source Adressing Modes: ldA 1234 # Immediate ldA R # Register ldA Label # Direct ldA (R) # Indexed ldA (R 8) # Indexed with offset ldA (R OFFS) ldA (Global) # Indirect ldA (Global OFFS) # Indirect with offset Destination Adressing Modes: stA (Global) # Indirect stA (Global OFFS) # Indirect with offset ldA R # Register stA (R) # Indexed stA (R 8) # Indexed with offset stA (R OFFS) Target Adressing Modes: jmp 1234 # Absolute jmp Label jmp (R) # Indexed jmp (Global) # Indirect The whole thing is so simple formost because there is only a single word size (i.e. 64 bit) for all instructions, with the exception of the 'B' register for byte operations. The instructions take the form of ldA something instead of ld A, something The reason for this is to have a separate instruction for each register (ldA, ldB, ldX etc.), making it probably easier to output completely different instruction sequences depending on the target architecture and its requirements or restrictions on individual registers. On x86, it looks like (asm ldA (Src Mode) (prinst "movq" Src "%rax") ) ... (asm ldD (Src Mode) (prinst "movq" Src "%rax") (prinst "movq" (highWord Src) "%rdx") ) ... (asm orA (Src Mode) (if (and (num? Mode) (>= 255 Mode)) (prinst "orb" Src "%al") (prinst "or" Src "%rax") ) ) and so on. Well, I hope this explains my intentions a little. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]