On 11/20/12 6:54 AM, c...@lavabit.com wrote:
Hello,

I know nothing about compilers and interpreters. I checked several
books, but none of them explained why we have to translate a
high-level language into a small (core) language. Is it impossible
(very hard) to directly translate high-level language into machine
code?

It is possible to remove stages in the standard compilation pipeline, and doing so can speed up compilation time. For example, Perl doesn't build an abstract syntax tree (for now-outdated performance reasons), and instead compiles the source language directly into bytecode (which is then interpreted by the runtime). This is one of the reasons why Perl is (or was?) so much faster than other interpreted languages like Python etc. But there are some big problems to beware of:

* Not having a concrete representation for intermediate forms can rule out performing obvious optimizations. And I do mean *obvious* optimizations; I can talk more about this problem in Perl, if you really care.

* Not having a concrete representation for intermediate forms means mixing together code from many different stages of the compilation process. This sort of spaghetti code is hard to maintain, and even harder to explain to new developers.

* Not having a concrete representation for intermediate forms can lead to code duplication (in the compiler) because there's no convenient way to abstract over certain patterns. And, of course, repeating code is just begging for inconsistency bugs due to the maintenance burden of keeping all the copies in sync.

All three points are major driving forces in having intermediate forms. Joachim Breitner gave some illustrations for why intermediate forms are "inevitable". But then, once you have intermediate forms, if you're interested in ensuring correctness and having a formal(izable) semantics, then it makes sense to try to turn those intermediate forms into an actual intermediate language. Intermediate forms are just an implementation detail, but intermediate languages can be reasoned about in the same ways as other languages. So it's more about shifting perspective in order to turn systems problems (implementation details) into language problems (semantics of the Core).

Furthermore, if you're a PL person and really are trying to ensure correctness of your language (e.g., type safety), you want to try to make your proof obligation as small as possible. For convenience to programmers, source code is full of constructs which are all more or less equivalent. But this is a problem for making proofs because when we perform case analysis on an expression we have to deal with all those different syntactic forms. Whereas if you first compile everything down into a small core language, then the proof has far fewer syntactic forms it has to deal with and so the proof is much easier. Bear in mind that this isn't just a linear problem. If we have N different syntactic forms, then proving something like confluence will require proving O(N^2) cases since you're comparing two different terms.

--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to