Graeme Geldenhuys asked in Vol 108, Issue 27, "What makes a compiler project 
special?"
Well, I'm not a member of the FPC but I've worked on several compilers and I'll 
throw in my 0.02 Euro into the discussion.
> Since Florian mentioned that a compiler project is "rocket science" [not his 
> direct words, but he hinted at that] and totally different to any other 
> software 
> project... It has really bugged me... Why is it different, and What is 
> different?
I'm going to have to disagree here, and it may simply display my own ignorance 
of the subject, but, then again, even a stopped clock is right twice a day.
A compiler is a "language processor," an application that converts code in one 
language into something else. If it's a translating compiler it converts it to 
another language. If it's a language compiler it converts it to binary code or 
potentially to assembly language. (I'm making a bit of a distinction in that a 
compiler that translates to assembly code isn't a "translator" because it is 
using the assembler to save some of the work in not "reinventing the wheel" and 
not having to create its own object file writer, and because compilers 
generating assembly are usually creating a finished output requiring no manual 
intervention. Most translators that change source from one (high level) 
language to another produce results that often require manual correction. Few 
translators produce "perfect" high-level to high-level conversions without some 
work. They'll do the "heavy lifting" but often minor "tweaks" or checking is 
required by the person.) 

At its core, a language processor is a text processing application. It takes a 
fixed combination of rules on what the programmer can and must "say" in order 
to specify the particular actions they want a program to accomplish. Given 
these rules, which are called "grammars" the programmer describes the program 
and the compiler takes that description and turns it into the target 
representation of that description.
In the case of a translator, it produces a new program in a different language. 
Or it may be the same language but converted to a different dialect, such as a 
translation from a different Pascal compiler, or a conversion from HP Cobol to 
IBM Mainframe Cobol, or conversion from C or Fortran to something newer.
Most language processors have gone to using parser generators in order to 
reduce the work involved in scanning a source language. Some may simply do 
language scanning directly. Most older Pascal compilers used "symbol 
substitution" in which as the language was scanned, it would create a symbol 
identifying what had been found. Whether it was an unrecognized word (which 
would indicate a user identifier), a symbol (like :, >, /, comma, etc) or a 
keyword (USES, UNIT, BEGIN, etc). Then the internal "current symbol" was set to 
the value of that symbol. 

Most compilers had about a 1 byte lookahead so that it could determine if it 
was a single byte symbol (comma, ^, or ' ) or a multibyte symbol which may be 
different depending on the second byte (> followed by an identifier vs. >=, : 
vs :=, < vs <> or <=). Okay, all of this was reasonable until object 
orientation came into use.
When one uses a variable, or a constant, which one are you using? Well, it 
depends on the "scope." If you have I defined in the main program (or in the 
definitions of a UNIT), your reference is to that one. If you're inside a 
procedure, function, method or other similar construct and you define I there, 
it uses that one/ But what if your program - or UNIT - calls several others 
UNITs each having a variable I defined, which one does it use? The first one? 
The last one?
Now, the plot thickens if you reference an object. An identifier in that object 
can be fixed or virtual. In which case, it may not be certain until execution 
time which one is being used, a variable or procedure in the base class or an 
overridden one in a descendant object. So a compiler has to read the tables in 
a unit in order to discover what items are visible and where they are in that 
unit, also to know what kind of variable (or procedure, or function) it is, and 
what is legal to do with it (can't add a 64-bit integer to an 8-bit unsigned 
byte because they're not compatible but you can do it the other way around.)

But this is still the translation of symbols and assigning them attributes 
including whether they are a standalone item (like a unit), a dependent item 
(like a variable in a program) or an internal item (like a field in a record or 
a member of an object.) It requires you keep information about these things but 
I don't think this is any worse than the work involved in a video game in 
holding state information about the game map, the player character (PC), 
non-player characters (NPCs), enemies, objects the player can hold (guns, 
Portal Device, radio) or the use or consume (money, ammo, health).
The last time I did a compile of the full compiler, it was on a reasonable 
machine maybe a year or two ago, was about 262,000 lines probably not including 
run-time libraries, and took an amazingly fast 13 seconds. In the end, it's 
still a text processor which attempts to take the explanation of what the 
programmer thinks the program is to do and translates it into a means to 
execute that explanation.
Even so, I'm sure it does not rise to the level of complexity of other types of 
applications involving other fields even if those programs are smaller in size. 
I suspect chemical analysis or actual programs involving real "rocket science" 
are considerably more complicated.
Let's put it at the level of a word processor, which might have to do a lot of 
similar things, such as process a document and redline the misspelled words, or 
even "compile" the formatted document into a PDF. But maybe that's too 
different a comparison as word processors do other things to documents. However 
I am trying to explain why a compiler application, while having some 
complexity, really isn't all that different from a typical "ordinary" 
application such as a word processor or other application most people deal with 
every day.
And is probably a lot less complex, too.
 
Paul 
Paul Robinson <p...@paul-robinson.us> - http://paul-robinson.us (My blog)
"The lessons of history teach us - if they teach us anything - that no one 
learns the lessons that history teaches us."

_______________________________________________
fpc-other maillist  -  fpc-other@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-other

Reply via email to