On Mon, Aug 26, 2002 at 08:13:16PM +0000, Cyber Wiz wrote: > Guarav had a good idea to use the GCJ sources to build a GCIL > looked into i saw that the GCJ sources are heavily confusing to browse and
GCJ cannot compile C/C++/ADA or any other of the GCC languages ... so it's typically a waste of time trying to hack GCJ unless you want a Java to IL compiler ... (duh !) > GCJ and Jilc AFAIK, JILC does not convert JVM to IL fully ... I had a peek ... The jdasm code works (except for my AWT code , coz the internal anonymous '$0' class fail miserably)... Gaurav, could you fix that ? > any ideas, suggestions, assistance, etc., would be welcome Read the below passage from Portable.Net FAQ .. for your answer ... IMO, I'm incapable of hacking GCC for such a major change, But if you can , good for you ! (and all gcc users) <snip> Why don't you use gcc as the basis for your C# compiler? A common question that arises is why we aren't using gcc to compile C# code to IL. Strategically, we would like to be able to reuse all of the good work that has gone into gcc. The DotGNU Project currently has an open request for someone to volunteer to modify gcc to generate IL bytecode. However it isn't quite as easy as it looks. The following script of a hypothetical discussion provides a blow by blow account of why this is so hard. This script is based in part on e-mails we have exchanged with users in the past. Why don't you add C# to the list of languages gcc supports? Because it won't solve the problem that we need to solve. Initially we need a C# compiler that can generate IL bytecode for the .NET platform. Later, we may need a C# compiler that can generate native code as well, but that is optional. Putting a C# parser on the front of gcc would give us a native compiler, but it won't give us an IL bytecode compiler. So what? Add an IL bytecode backend to gcc, and you'll solve your problem, and also be able to compile C, C++, Fortran, etc, to .NET. This is not as easy as it looks. Gcc is divided into a number of phases: parsing, semantic analysis, tree-to-RTL conversion, RTL handling (including optimization), and final native code generation. The hard part is RTL (Register Transfer Language). This part of gcc is hard-wired to generate code for register-based CPU's such as i386, PPC, Sparc, etc. RTL is not designed for generating code for stack-based abstract machines such as IL. Also, RTL loses a lot of the type and code structure information that IL needs in the final output file. By the time RTL gets the code, information about whether a value is an integer or an object reference is mostly lost. Information about the class structure of the code is lost. This information is critical for correct compilation of C# to IL. But hang on a second! Gcj, the Java back-end for gcc, does stack machines! Why not do something like that? Err ... no it doesn't. The Java bytecode stuff in gcj is not organised as an RTL back-end. When gcj compiles Java, it performs parsing and semantic analysis in the front-end, like the other supported languages. Then the parse tree is sent in one of two different directions. If gcj is compiling to native, the parse tree is handed to the RTL core of the compiler, and it takes over. If gcj is compiling to bytecode, the parse tree is handed to a completely separate code generator that knows about Java bytecode. Because gcj does NOT implement a bytecode RTL back-end for gcc, it cannot compile C, C++, etc down to bytecode. Java bytecode is a special case that only works for the Java front-end. But what about egcs-jvm? Doesn't it compile C to Java bytecode? It's a hack. The code that it generates is horrible, and does not conform to the usual conventions that the JVM requires. If one compiled Java code using this back-end, it wouldn't work with normal Java code due to the differences in calling conventions and what-not. The biggest problem that the author of egcs-jvm he had was trying to work around the register machine assumptions in the code. The result wasn't pretty. He has said that it would be easier to throw the JVM away and invent a register-based abstract machine than try to make gcc generate efficient stack machine code Isn't there a gcc port to the Transputer, which is stack-based? Yes there is, for an older version of gcc (2.7.2). The source can be found at http://wotug.ukc.ac.uk/parallel/transputer/software/ compilers/gcc/pereslavl/ It appears to compile the code to a pseudo-register machine, and then fixes up the code to be stack based afterwards. It takes advantage of some register stack features in gcc that egcs-jvm didn't use. The Transputer is still a traditional CPU despite being stack-based. The gcc port puts pointer values into integer pseudo-registers, which would violate the security requirements of IL. The i386 gcc port uses a regular set of registers for integer/pointer values, and a register stack for floating point values. The Transputer port uses two register stacks: one for integer/pointer values, and the other for floating point values. It may be possible to use three register stacks for IL: one for integer values, another for pointer values, and a third for floating point values. However, this still may not give a useful result. This fixes the security problems for the pseudo-registers, but it doesn't fix the security problems for memory. RTL assumes that main memory is a flat, untyped, address space, where any kind of value can be stored in any word. Partitioning main memory into separate types may not be possible without a rewrite of RTL. OK, so do something similar to gcj for C#. Use two code generators. That would work right? Yes it would, except for one small catch. Because there are so many people who don't understand how gcc works, they will assume that they can compile C and C++ to IL bytecode after we release the C# patches. Then they will discover that this isn't the case and will get extremely angry that we didn't build what they thought we were building. *sigh* Now matter how we attack the problem, we will end up having to write an IL bytecode backend for RTL, which is extremely difficult because of the various assumptions in the code. Realistically, someone with a great deal of gcc knowledge needs to go into the gcc core, rip RTL completely out, throw it away, and replace it with something that knows about both register machines and stack machines. Alternatively, someone could create a STL (Stack Transfer Language), that passes all languages through a separate code generator that knows about stack machines. Then we can write STL back-ends for IL and JVM bytecode. Both gcj and DotGNU would benefit from this. We're not buying it. It's not as hard as you think.? Fine. Prove us wrong. Download the gcc sources and have at it. The Transputer port may be a good place to start to get ideas, or it may not. </snip> JILC does not work and gcj does not compile C++ ... otherwise your idea seems ok to me >:-)... _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
