Hi Folks,

If you aren't a C++ programmer, this may not be interesting.  If you are
a proselytising functional programming advocate, this may only reinforce
your preconceptions.

I am working on getting all of the cross assembler and cross linker
functionality into my UCSD Pascal cross compiler package.  This is
definitely a challenging portion of the project.  The sources of the
native UCSD Linker say "Abandon All Hope Ye Who Enter Here", and they
weren't kidding.

In order to get the cross compiler to emit linkage information for
external UNIT variables, I have to jigger the code generation to (a)
always produce a 2-byte Big reference (LAO, LDO, STO) and never optimise
to SLDO etc, and (b) emit linkage information.  The second part is the
easiest.  It is mangling the code generation for the special case of an
unknown global variable offset that is nasty.  And then, if that wasn't
enough, intrinsic units can have external data segments with completely
different opcodes (LAE, LDE, STE).

To explain the next bit, I have to back up to 2006, when I wrote a paper
called "Compilers and Factories" for LCA2007.  The central theme of that
paper was to pass decisions and control of how to manipulate the
expression trees to the expression tree objects.  An example:

The Pascal grammar has

        statement = expression ':=' expression

This results in unhelpful "Syntax error" errors when, if like me you
have been coding in C or C++ for the intervening 30 years, you write

        x = 5;

In order to get a more helpful error message, the trick is to move the
error out of the grammar, and into the semantics.  (This is what C does,
except C is more generous).  The grammar is changed to

        statement = expression

Of course, if the expression actually has a non-nothing result, it is an
error.  In the assignment example, above, the error message the cross
compiler gives is

        statement expression is a boolean value, it should be nothing;
        did you mean to use an assignment (written ":=") instead of an
        equality test (written "=")?

This is a much more helpful error message.

But now we can have expressions on both the left and the right hand
sides of an assignment.  How do we know which opcodes (loads or stores)
to generate?  Well, the cross compiler uses abstract syntax trees,
rather than generating code as it is parsed... we have *much* more
memory to play with than the UCSD native Pascal compiler ever did.

To produce the assignment, the compiler could grope the left hand
expression, and do different code branches for global stores, array
index stores, record field stores, etc.  But the approach taken is
different: it simply asks the left hand side to turn itself into an
assignment, the yacc grammar looks like this:

        expression: expression ASSIGN expression
          {
            $$ = $1->assignment_factory($3);
          }

As you can see, no groping of the left hand side ($1) is required.  The
default implementation of the virtual assignment_factory method is to
say "inappropriate assignment".  Thus, a simple variable load object
creates a new variable store expression object, an array index load
object creates a new array store expression object, etc.

The same technique can be used to handle array indexing, "dot"
expressions, and function and procedure calls.


Yes, but what does this have to do with variables?

The cross compiler's yacc grammar has a production like this:

        expression: NAME
            {
                $$ = name_expression_factory($1);
            }

This operates under the assumption that it is probably on the right hand
side, and generates load expressions.

This name_expression_factory did a chain of {if then else if then
else...} tests to decide what to do, all involving nasty C++ down casts,
which makes my skin crawl, because too often it's bug in hiding.
        
        expression::pointer
        name_expression_factory(symbol::pointer sp)
        {
            symbol_constant::pointer scp =
        boost::dynamic_pointer_cast<symbol_constant>(sp);
            if (scp)
            {
                return scp->get_value();
            }
        
            symbol_variable::pointer svp =
        boost::dynamic_pointer_cast<symbol_variable>(sp);
            if (scp)
            {
                return
        
expression_load_indirect::create(expression_address_local::create(svp->get_offset()));
            }
        
            ...etc
            // uglier than this, but you get the idea
        }

Then (four years later) it occurs to me: let the symbol create the name
expression.

        expression::pointer
        name_expression_factory(symbol::pointer sp)
        {
            return sp->name_expression_factory();
        }

and moving each test case into the symbol derived classes implementation
of name_expression_factory().  No down casts, either.


Yes, but what does that have to do with external linkage, or variables
in DATA segments?

Let's take the external DATA segment case first: Further derive the
symbol_variable class, so that we have symbol_variable_external

        expression::pointer
        symbol_variable_external::name_expression_factory()
        {
                return
                expression_load_indirect::create( 
expression_address_external::create(segnum, offset));
        }

and the "normal" case of a function's local variables

        expression::pointer
        symbol_variable_local::name_expression_factory()
        {
                return
                expression_load_indirect::create( 
expression_address_local::create(offset));
        }

etc.  There is extra machinery setting up the classes (and C++ is
hideously verbose is this regard) but once done, no more type-flakey and
expensive "what is this" tests, more readable, and it goes faster.


Yes, but how does that address the "global variable of unknown offset in
a (non-intrinsic) unit" case?

Another derived class, of course.

        expression::pointer
        symbol_variable_globref::name_expression_factory()
        {
            // the "globref" name is taken from the kind
            // for linkage information to emit.
            return
            expression_load_indirect::create( 
expression_address_globref::create(name));
            // that "name" is the variable's name, an instance
            // variable of the symbol base class
        }

Now, the LDO=>SLDO optimisations are not done by the
expression_address_globref class (oh, um, did I mention that expression
objects know how to optimize themselves?) because we can't know if the
offset<=16.  And the opcode's Big offset is always generated as two
bytes, in case offset>=128, avoiding the usual optimising code path for
Big offsets.  The expression_address_global class, of course, continues
to optimise as before, because it *does* know it's offset.


So that's all.  The lightning bolt was to realise that I wasn't using
some techniques I was already using elsewhere in the compiler, and that
inconsistency was making for painful thinking about how to solve the
problem in an elegant manner.  So painful that I worked on something
else for a while.

That is one of my favourite aspects of open source, especially on
projects I'm doing for myself: you can take the time to do it right.


Regards
Peter Miller <pmil...@opensource.org.au>
/\/\*        http://miller.emu.id.au/pmiller/

PGP public key ID: 1024D/D0EDB64D
fingerprint = AD0A C5DF C426 4F03 5D53  2BDB 18D8 A4E2 D0ED B64D
See http://www.keyserver.net or any PGP keyserver for public key.

"It's my crack pipe, and I can put anything in it I want
to."  -- Erik de Castro Lopo

_______________________________________________
coders mailing list
coders@slug.org.au
http://lists.slug.org.au/listinfo/coders

Reply via email to