Hmm, i didn't explain myself properly at all.
Probably what i wrote is orthogonal to Casey's idea.
Picking up on where Casey wrote

"transform representations of various "heights" ultimately into machine code"

I was thinking of starting higher up than machine code and finishing with something that could run reasonably well under emulation.. The idea: move old code across to be compatible with fonc++-- when it emerges.
The -- meaning if it's statically typed now it would be then.
In a context of going from source to M/C, ( /for a different real or virtual machine or to another source language (not exactly what Casey raised)):
Then
/To leverage a common starting point or two why not take the C compiler IR (or any (GNU language compatible IR) or LLVM as an alternative starting point) as the abstract machine /starting from/ the C <whatever language> IR, and representing it in /either /some other (new higher level) or ((real or virtual) machine code language. So if there is a pattern such as a "forall loop" that Ometa or FermaT can identify then you could either prepare a virtual machine ISA or indeed /up-level /it to something compatible with a typed version of what emerges from fonc. Of course you would have to have the language spec for fonc++ before you set out on a particular case.

Maybe i dug a hole for myself. Oh well. :-[

Regards,
Gerry Jensen
02 9713 6004



BGB wrote:
my effort had not gone nearly so high "up" the abstraction tree, but instead operated in a space more like an abstracted x86 machine.

moving to a much higher level model, such as that of GCC IR or LLVM IR, would likely be difficult to pull off effectively starting from "real" machine code, such as x86.

what I had done was essentially just partly inverting several of the low-level stages in the process: assembly, since my assembler was mostly data-driven, disassembly is not difficult using essentially the same data;
partly abstracting over matters of word-size and opcode argument forms;
...

then, it was this partially-abstracted form which was interpreted.

this was at a similar level of abstraction to that in my lower-level codegen (namely dealing with registers and values as handles), rather than making it all the way back up to a target-neutral IR.


converting to a higher-level IR would likely require something analogous to a compiler+optimizer, namely to translate these decoded instructions into generic IR sequences, and then try to optimize away all the cruft which doesn't matter (such as all the "eflags" magic for sequences which don't actually care about eflags).
...

the "eflags" issue is mostly because, for example, in x86 nearly every conventional opcode modifies eflags, but in the majority of cases, these changed flags are irrelevant (however, a forward scan and bit-masking could likely allow for detecting cases where the modified flags are known to be irrelevant).

also, x86 includes a small number of "very complex" opcodes, such as "cpuid", which could be awkward if trying to produce an entirely generic IR (since cpuid changes its behavior and results depending on the values contained in certain registers), ...


this level of translation though is likely to either rule out or hinder the use of self-modifying code, since SMC would essentially invalidate previously translated sequences.

(in my case I had dealt with SMC simply by flushing the entire opcode cache, which in this case was essentially just a big hash-table holding "opcode" structures).

ideally, with a more complex "decode" process, the process of flushing on SMC could be done cheaply and incrementally, rather than, say, essentially having to recompile an app in memory continuously simply as it happens to be self-modifying.

luckily, most executable code is marked as read-only, and SMC cases are fairly rare, and so attempts at SMC are more often grounds for a simulated GPF, rather than grounds for flushing the decode cache.

static translation, however, is likely to exclude the possiblity of SMC.


or such...


----- Original Message ----- From: "Monty Zukowski" <mo...@codetransform.com>
To: "Fundamentals of New Computing" <fonc@vpri.org>
Sent: Tuesday, June 22, 2010 8:37 AM
Subject: Re: [fonc] Reverse OMeta and Emulation


GNU C was explicitly designed to make its intermediate representation
hard to work with.  LLVM is a more practical choice.

Monty

On Mon, Jun 21, 2010 at 6:02 PM, Gerry J <geral...@tpg.com.au> wrote:
You may find the concept of semantic slicing relevant:
http://www.cse.dmu.ac.uk/~mward/martin/papers/csmr2005-t.pdf
There is software at:
http://www.cse.dmu.ac.uk/~mward/fermat.html

One possible path to explore is to take GNU C etc intermediate
representation of source as the "assembly language" of a VM and reverse from
that to a more portable VM, as in Squeak or Java.
Perhaps Ometa could be combined in some way with FermaT to recognise
patterns and port legacy code to a fonc VM ?

Regards,
Gerry Jensen
02 9713 6004



_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc



_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to