On 2012-12-18 01:13, H. S. Teoh wrote:
The problem is not so much the structure preprocessor -> compiler -> assembler -> linker; the problem is that these logical stages have been arbitrarily assigned to individual processes residing in their own address space, communicating via files (or pipes, whatever it may be).The fact that they are separate processes is in itself not that big of a problem, but the fact that they reside in their own address space is a big problem, because you cannot pass any information down the chain except through rudimentary OS interfaces like files and pipes. Even that wouldn't have been so bad, if it weren't for the fact that user interface (in the form of text input / object file format) has also been conflated with program interface (the compiler has to produce the input to the assembler, in *text*, and the assembler has to produce object files that do not encode any direct dependency information because that's the standard file format the linker expects). Now consider if we keep the same stages, but each stage is not a separate program but a *library*. The code then might look, in greatly simplified form, something like this: import libdmd.compiler; import libdmd.assembler; import libdmd.linker; void main(string[] args) { // typeof(asmCode) is some arbitrarily complex data // structure encoding assembly code, inter-module // dependencies, etc. auto asmCode = compiler.lex(args) .parse() .optimize() .codegen(); // Note: no stupid redundant convert to string, parse, // convert back to internal representation. auto objectCode = assembler.assemble(asmCode); // Note: linker has direct access to dependency info, // etc., carried over from asmCode -> objectCode. auto executable = linker.link(objectCode); File output(outfile, "w"); executable.generate(output); } Note that the types asmCode, objectCode, executable, are arbitrarily complex, and may contain lazy-evaluated data structure, references to on-disk temporary storage (for large projects you can't hold everything in RAM), etc.. Dependency information in asmCode is propagated to objectCode, as necessary. The linker has full access to all info the compiler has access to, and can perform inter-module optimization, etc., by accessing information available to the *compiler* front-end, not just some crippled object file format. The root of the current nonsense is that perfectly-fine data structures are arbitrarily required to be flattened into some kind of intermediate form, written to some file (or sent down some pipe), often with loss of information, then read from the other end, interpreted, and reconstituted into other data structures (with incomplete info), then processed. In many cases, information that didn't make it through the channel has to be reconstructed (often imperfectly), and then used. Most of these steps are redundant. If the compiler data structures were already directly available in the first place, none of this baroque dance is necessary.
I couldn't agree more. -- /Jacob Carlborg
