Hi Jakob,

while LLVM is surely an interesting project, I think it would by far be
overkill for what we need here.

What I'm doing is extremely simple. I defined a (mostly) single-address
assembly language, with a rather basic set of instructions which I
believe can easily be mapped to differing architectures. At least, there
are no assumptions about arity, register sets or instruction formats.

And, most important, it is quite readable (as opposed the most current
assembly languages (including LLVM)!


Though I think it is still too early to publish yet, let me show you
example from the current sources ('car' and 'if').

   # (c....r 'lst) -> any
   (code 'doCarE_E 2)
      push X
      ldX E
      ldE (E CDR)  # Evaluate CADR
      ldE (E)
      eval
      num? E  # Check list
      jnz lstErrEX
      ldE (E)  # Take CAR
      pop X
      ret

On x86-64 this expands to:

      .balign 16
      nop
      nop
      .global doCarE_E
   doCarE_E:
      pushq %r13
      movq %rbx, %r13
      movq 8(%rbx), %rbx
      movq (%rbx), %rbx
      test $0x06, %bl
      jnz 2f
      test $0x08, %bl
      jnz 1f
      call evListE_E
      jmp 2f
   1:
      movq (%rbx), %rbx
   2:
      testb $0x06, %bl
      jnz lstErrEX
      movq (%rbx), %rbx
      popq %r13
      ret


'if' is a little longer:

   # (if 'any1 'any2 . prg) -> any
   (code 'doIfE_E 2)
      ldE (E CDR)  # Body
      push (E CDR)  # Push rest
      ldE (E)  # Get condition
      eval  # Eval condition
      nil? E
      if ne  # Non-NIL
         stE (AtSym)
         pop E  # Get rest
         ldE (E)  # Consequent
         eval/ret
      end
      xchX (S)  # Get rest in X
      ldX (X CDR)  # Else
      do
         ldE (X)
         ldX (X CDR)
         eval
         atom? X  # Atom?
      until nz  #  Yes
      pop X
      ret

with this on x86-64:

      .balign 16
      nop
      nop
      .global doIfE_E
   doIfE_E:
      movq 8(%rbx), %rbx
      pushq 8(%rbx)
      movq (%rbx), %rbx
      test $0x06, %bl
      jnz 2f
      test $0x08, %bl
      jnz 1f
      call evListE_E
      jmp 2f
   1:
      movq (%rbx), %rbx
   2:
      cmpq $Nil, %rbx
      jz .125
      movq %rbx, AtSym
      popq %rbx
      movq (%rbx), %rbx
      test $0x06, %bl
      jnz ret
      test $0x08, %bl
      jz evListE_E
      movq (%rbx), %rbx
      ret
   .125:
      xchg %r13, (%rsp)
      movq 8(%r13), %r13
   .126:
      movq (%r13), %rbx
      movq 8(%r13), %r13
      test $0x06, %bl
      jnz 2f
      test $0x08, %bl
      jnz 1f
      call evListE_E
      jmp 2f
   1:
      movq (%rbx), %rbx
   2:
      testb $0x0E, %r13b
      jz .126
      popq %r13
      ret


The mapping of the individual instructions is rather straightforward. On
the x86 architecture, most of them expand to a single target
instruction.

The machine register set is defined as:

      +---+---+---+---+---+---+---+---+
      |               A           | B |  \      [A]ccumulator
      +---+---+---+---+---+---+---+---+   D     [B]yte register
      |               C               |  /      [C]ount register
      +---+---+---+---+---+---+---+---+         [D]ouble register
      |               E               |         [E]xpression register
      +---+---+---+---+---+---+---+---+


      +---+---+---+---+---+---+---+---+
      |               X               |         [X] Index register
      +---+---+---+---+---+---+---+---+         [Y] Index register
      |               Y               |         [Z] Index register
      +---+---+---+---+---+---+---+---+
      |               Z               |
      +---+---+---+---+---+---+---+---+


      +---+---+---+---+---+---+---+---+
      |               L               |         [L]ink register
      +---+---+---+---+---+---+---+---+         [S]tack pointer
      |               S               |
      +---+---+---+---+---+---+---+---+


      +-------------------------------+
      |  [z]ero    [s]ign    [c]arry  |         [F]lags
      +-------------------------------+

   Source Adressing Modes:
      ldA 1234          # Immediate
      ldA R             # Register
      ldA Label         # Direct
      ldA (R)           # Indexed
      ldA (R 8)         # Indexed with offset
      ldA (R OFFS)
      ldA (Global)      # Indirect
      ldA (Global OFFS) # Indirect with offset

   Destination Adressing Modes:
      stA (Global)      # Indirect
      stA (Global OFFS) # Indirect with offset
      ldA R             # Register
      stA (R)           # Indexed
      stA (R 8)         # Indexed with offset
      stA (R OFFS)

   Target Adressing Modes:
      jmp 1234          # Absolute
      jmp Label
      jmp (R)           # Indexed
      jmp (Global)      # Indirect


The whole thing is so simple formost because there is only a single word
size (i.e. 64 bit) for all instructions, with the exception of the 'B'
register for byte operations.

The instructions take the form of

   ldA something

instead of

   ld A, something

The reason for this is to have a separate instruction for each register
(ldA, ldB, ldX etc.), making it probably easier to output completely
different instruction sequences depending on the target architecture and
its requirements or restrictions on individual registers.

On x86, it looks like

   (asm ldA (Src Mode)
      (prinst "movq" Src "%rax") )

   ...

   (asm ldD (Src Mode)
      (prinst "movq" Src "%rax")
      (prinst "movq" (highWord Src) "%rdx") )

   ...

   (asm orA (Src Mode)
      (if (and (num? Mode) (>= 255 Mode))
         (prinst "orb" Src "%al")
         (prinst "or" Src "%rax") ) )

and so on.


Well, I hope this explains my intentions a little.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Reply via email to