Re: [fonc] Reverse OMeta and Emulation

BGB Tue, 22 Jun 2010 10:09:03 -0700

my effort had not gone nearly so high "up" the abstraction tree, but insteadoperated in a space more like an abstracted x86 machine.

moving to a much higher level model, such as that of GCC IR or LLVM IR,would likely be difficult to pull off effectively starting from "real"machine code, such as x86.

what I had done was essentially just partly inverting several of thelow-level stages in the process:assembly, since my assembler was mostly data-driven, disassembly is notdifficult using essentially the same data;

partly abstracting over matters of word-size and opcode argument forms;
...

then, it was this partially-abstracted form which was interpreted.

this was at a similar level of abstraction to that in my lower-level codegen(namely dealing with registers and values as handles), rather than making itall the way back up to a target-neutral IR.

converting to a higher-level IR would likely require something analogous toa compiler+optimizer, namely to translate these decoded instructions intogeneric IR sequences, and then try to optimize away all the cruft whichdoesn't matter (such as all the "eflags" magic for sequences which don'tactually care about eflags).

...

the "eflags" issue is mostly because, for example, in x86 nearly everyconventional opcode modifies eflags, but in the majority of cases, thesechanged flags are irrelevant (however, a forward scan and bit-masking couldlikely allow for detecting cases where the modified flags are known to beirrelevant).

also, x86 includes a small number of "very complex" opcodes, such as"cpuid", which could be awkward if trying to produce an entirely generic IR(since cpuid changes its behavior and results depending on the valuescontained in certain registers), ...

this level of translation though is likely to either rule out or hinder theuse of self-modifying code, since SMC would essentially invalidatepreviously translated sequences.

(in my case I had dealt with SMC simply by flushing the entire opcode cache,which in this case was essentially just a big hash-table holding "opcode"structures).

ideally, with a more complex "decode" process, the process of flushing onSMC could be done cheaply and incrementally, rather than, say, essentiallyhaving to recompile an app in memory continuously simply as it happens to beself-modifying.

luckily, most executable code is marked as read-only, and SMC cases arefairly rare, and so attempts at SMC are more often grounds for a simulatedGPF, rather than grounds for flushing the decode cache.


static translation, however, is likely to exclude the possiblity of SMC.


or such...

----- Original Message -----From: "Monty Zukowski" <[email protected]>

To: "Fundamentals of New Computing" <[email protected]>
Sent: Tuesday, June 22, 2010 8:37 AM
Subject: Re: [fonc] Reverse OMeta and Emulation

GNU C was explicitly designed to make its intermediate representation
hard to work with.  LLVM is a more practical choice.

Monty

On Mon, Jun 21, 2010 at 6:02 PM, Gerry J <[email protected]> wrote:

You may find the concept of semantic slicing relevant:
http://www.cse.dmu.ac.uk/~mward/martin/papers/csmr2005-t.pdf
There is software at:
http://www.cse.dmu.ac.uk/~mward/fermat.html

One possible path to explore is to take GNU C etc intermediate

representation of source as the "assembly language" of a VM and reversefrom

that to a more portable VM, as in Squeak or Java.
Perhaps Ometa could be combined in some way with FermaT to recognise
patterns and port legacy code to a fonc VM ?

Regards,
Gerry Jensen
02 9713 6004



_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Reverse OMeta and Emulation

Reply via email to