Hmm, i didn't explain myself properly at all.
Probably what i wrote is orthogonal to Casey's idea.
Picking up on where Casey wrote
"transform representations of various "heights" ultimately into machine code"
I was thinking of starting higher up than machine code and finishing
with something that could run reasonably well under emulation..
The idea: move old code across to be compatible with fonc++-- when it
emerges.
The -- meaning if it's statically typed now it would be then.
In a context of going from source to M/C, ( /for a different real or
virtual machine or to another source language (not exactly what Casey
raised)):
Then
/To leverage a common starting point or two why not take the C compiler
IR (or any (GNU language compatible IR) or LLVM as an alternative
starting point) as the abstract machine /starting from/ the C <whatever
language> IR, and representing it in /either /some other (new higher
level) or ((real or virtual) machine code language.
So if there is a pattern such as a "forall loop" that Ometa or FermaT
can identify then you could either prepare a virtual machine ISA or
indeed /up-level /it to something compatible with a typed version of
what emerges from fonc.
Of course you would have to have the language spec for fonc++ before you
set out on a particular case.
Maybe i dug a hole for myself. Oh well. :-[
Regards,
Gerry Jensen
02 9713 6004
BGB wrote:
my effort had not gone nearly so high "up" the abstraction tree, but
instead operated in a space more like an abstracted x86 machine.
moving to a much higher level model, such as that of GCC IR or LLVM
IR, would likely be difficult to pull off effectively starting from
"real" machine code, such as x86.
what I had done was essentially just partly inverting several of the
low-level stages in the process:
assembly, since my assembler was mostly data-driven, disassembly is
not difficult using essentially the same data;
partly abstracting over matters of word-size and opcode argument forms;
...
then, it was this partially-abstracted form which was interpreted.
this was at a similar level of abstraction to that in my lower-level
codegen (namely dealing with registers and values as handles), rather
than making it all the way back up to a target-neutral IR.
converting to a higher-level IR would likely require something
analogous to a compiler+optimizer, namely to translate these decoded
instructions into generic IR sequences, and then try to optimize away
all the cruft which doesn't matter (such as all the "eflags" magic for
sequences which don't actually care about eflags).
...
the "eflags" issue is mostly because, for example, in x86 nearly every
conventional opcode modifies eflags, but in the majority of cases,
these changed flags are irrelevant (however, a forward scan and
bit-masking could likely allow for detecting cases where the modified
flags are known to be irrelevant).
also, x86 includes a small number of "very complex" opcodes, such as
"cpuid", which could be awkward if trying to produce an entirely
generic IR (since cpuid changes its behavior and results depending on
the values contained in certain registers), ...
this level of translation though is likely to either rule out or
hinder the use of self-modifying code, since SMC would essentially
invalidate previously translated sequences.
(in my case I had dealt with SMC simply by flushing the entire opcode
cache, which in this case was essentially just a big hash-table
holding "opcode" structures).
ideally, with a more complex "decode" process, the process of flushing
on SMC could be done cheaply and incrementally, rather than, say,
essentially having to recompile an app in memory continuously simply
as it happens to be self-modifying.
luckily, most executable code is marked as read-only, and SMC cases
are fairly rare, and so attempts at SMC are more often grounds for a
simulated GPF, rather than grounds for flushing the decode cache.
static translation, however, is likely to exclude the possiblity of SMC.
or such...
----- Original Message ----- From: "Monty Zukowski"
<mo...@codetransform.com>
To: "Fundamentals of New Computing" <fonc@vpri.org>
Sent: Tuesday, June 22, 2010 8:37 AM
Subject: Re: [fonc] Reverse OMeta and Emulation
GNU C was explicitly designed to make its intermediate representation
hard to work with. LLVM is a more practical choice.
Monty
On Mon, Jun 21, 2010 at 6:02 PM, Gerry J <geral...@tpg.com.au> wrote:
You may find the concept of semantic slicing relevant:
http://www.cse.dmu.ac.uk/~mward/martin/papers/csmr2005-t.pdf
There is software at:
http://www.cse.dmu.ac.uk/~mward/fermat.html
One possible path to explore is to take GNU C etc intermediate
representation of source as the "assembly language" of a VM and
reverse from
that to a more portable VM, as in Squeak or Java.
Perhaps Ometa could be combined in some way with FermaT to recognise
patterns and port legacy code to a fonc VM ?
Regards,
Gerry Jensen
02 9713 6004
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc