[Lazarus] Parser

Hans-Peter Diettrich Wed, 30 Jun 2010 03:28:37 -0700

In examples/parser/no_cpu you find a new project, that can be used as a
Pascal parser, or as a compiler template for a new CPU.


This project is kind of a plug-in for the FPC compiler, i.e. it uses the
original FPC parser (fpcsrc/compiler/p*). More projects will follow,
making the FPC parser usable inside other applications.

As the name "no_cpu" indicates, this project does not implement a
specific CPU, because code generation is not required for an universal
parser. The project name "ppcFromM68K" indicates that it is based on the
M68K compiler, and it (currently) requires a $DEFINE M86K, because all
acceptable CPUs are hard-coded in the FPC compiler code. Effectively
every CPU resides in its own subdirectory, that is added to the path
when the compiler is built, and that contains several units of
predefined names and contents; see the no_cpu/readme.txt for these details.

Until now these problems make the parser unusable in other projects:

1) The compiler requires that all used units can be found, and are
translated as well.

2) All units implicitly use System. Until now I could not make this unit
found, so the test project was named system.pas, to bypass the search
for the system unit in the compiler. A compiler switch (-s?) may have
the same result.

3) The compiler builds an parse tree for every procedure, but I found no
way yet to make this tree accessible. There should exist a
method/procedure in the CPU specific code, that is called to create the
binary code for a procedure, but I could not yet locate it.

4) It's not yet known how the rest of a unit (declarations...) is
represented internally. Some tokens (comments...) simply are skipped,
what can be cured by modifications to the scanner.

5) Conditional compilation only processes one branch, and macros are
expanded. While macro expansion may be suppressed, somehow, the
compilation of multiple conditional branches really doesn't make sense.
We'll have to find an way to submit the exactly defined symbols to the
compiler, so that the intended branches become part of the parse tree.
For the use of the parser in an syntax-highlighter a different approach
must be choosen, that allows to classify all tokens for the syntax
highlighter itself, and that also allows to identify sections,
procedures and blocks for folding and the determination of e.g.
begin-end pairs.


In detail the last item [5] suggests an more flexible parser, that can
do with the scanned tokens whatever is appropriate in the scope of a
specific application. The general solution is a separation of the
syntactical and semantical procedures in the parser. For fastest
processing the semantical code can be made selectable just as for the
CPU, by placing this code into a dedicated directory. I hope that this
solution is acceptable to the FPC maintainers, and I'm willing to
refactor all the parser units accordingly.

Another solution would use a Semantics class, with the benefit that
different trees can be built from the source code in one or more runs;
one such class could provide the classified tokens to an syntax
highlighter, another one could provide the block structure of the unit.
This solution can be derived from the procedural solution, when the
semantical procedures delegate all work to a supplied Semantics object,
or to multiple objects in parallel - nothing that would affect the
remaining compiler code at all.

One big advantage of the separation into syntactical and semantical
parts is the chance for adding further languages to the compiler, as
selectable front-ends, which use the already existing classes and
procedures for code generation. Adding e.g. Oberon syntax should not
require many changes to the existing code, while other languages (C/C++)
would require to add and handle new node types during optimization and
code generation. But such extensions are beyond the scope of the current
parser examples.


Any comments and suggestions are welcome. If somebody wants to
contribute to this or related projects, I'll add it or apply according
patches to the examples/parser tree. Any assistance is welcome, in
finding out the places where the existing compiler code can be modified,
in order to overcome the beforementioned problems. FPDoc documentation
of the compiler also will be welcome...

DoDi



--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

[Lazarus] Parser

Reply via email to