Re: [Cython] Recursive vs. visitor pattern

Greg Ewing Wed, 21 May 2008 19:23:22 -0700

Dag Sverre Seljebotn wrote:
> The big refactoring you refer to is related to 
> seperating the type analysis and type coercion phases


I'm not sure whether you would gain much by doing this.
The place where you discover that a coercion is needed
is during type analysis, where you look at the types of
the operations, decide whether they're compatible, and
see whether they need to be converted to a common type,
etc. Also you work out what the result type will be and
annotate the node with it.

To move the creation of coercion nodes into a separate
pass, you would have to leave an annotation on the node
saying that a coercion is needed. But if some phase
between then and doing the actual coercions rearranges
the parse tree, these annotations may no longer be
correct.

So you would have to disallow any other phases between
type analysis and coercion, or at least put some
restrictions on what they can do, such as not altering
the structure of the parse tree, or doing anything
else that could invalidate the calculated types.

What sort of things were you intending to do in between
type analysis and coercion? Could they still be done under
these restrictions?

More generally, sometimes trying to split things up into
more phases can make things more complicated rather than
simpler. As an example of this, I'm currently thinking
about eliminating the allocate_temps subphase of expression
analysis and combining it with the code generation phase.

The reason is that there's currently a rather non-obvious
dependency between these phases. The order in which temp
variables are allocated and released during allocate_temps
has to exactly match the order in which code is generated
that creates and disposes the references that will be
put in those temps. This makes it rather tricky to both
write and maintain code for these two phases.

The reason they're separate phases at the moment is that
I was initially writing the generated code directly to the
output file, so I had to know what temp variable declarations
would be needed before starting to write any of the
body code for a function. However, I'm currently writing
the declaration and executable code to separate buffers and
combining them afterwards, so there's probably no need for a
separate allocate_temps pass any more, and combining it
with code generation is likely to simplify quite a lot
of things.

> It would be better (see below) to have "deeper" cuts, i.e. so that one 
> could say "now the entire tree has been analysed", "now the entire tree 
> is ready for code generation", rather than some parts (functions etc.) 
> being in a seperate state.

The reason for doing functions that way is that it seemed
wasetful to keep all the symbol tables for the local
scopes around longer than necessary.

That decision was probably influenced by an earlier project
in which I wrote a compiler for a Modula-like language that ran
on machines much smaller than we have today. It used a 3-pass
arrangement that kept all the symbol tables for everything
between passes, with the result that it could only compile
a module a few hundred lines long before running out of memory.

That experience gave me an appreciation of why Wirth prefers
to write single-pass compilers.

Although the memory issue probably isn't a concern nowadays, the
aforementioned experience led me to approach the problem with the
mindset of using as few passes as possible. The only reason I used
separate analysis and generation phases at all in the beginning
was so that you can refer to C functions that are defined further
down without needing forward declarations.

 > Consider for instance trivial inner functions. One
> natural way to implement this is just "throwing the function out"
 > ...
> so you *somehow* need an ugly kludge to 
> get around this, and spend time thinking about that

I don't think it would be all that difficult to make a pass
over the function body just before generating code for it,
that generates code for any nested functions.

In fact, I have a suspicion that for the special case you're
talking about (no references to intermediate scopes) it would
"just work", because the analysis phase will already have
generated an appropriately-mangled C name for the function.

On the other hand, "throwing the function out" presents
difficulties of its own. As you mentioned, some kind of name
mangling would need to be done, which requires knowing which
names are supposed to refer to that function, so you need
some kind of symbol table functionality available in the
pass where you do the throwing-out.

The obvious thing is to use the real symbol table, but
that means doing it after the declaration analysis phase,
by which time the only problem remaining is how to get the
C code generated in the right place -- which as I've
said isn't really all that hard.

> (In reality there'll be full closures instead, which just means 
> generating a "cdef class" (with state) at module scope rather than a 
> function. I'd like to see code doing that in less than 150 lines without 
> using any of my "new structure" proposals...)

I'll be impressed if you can do it in 150 lines whatever
structure you're using. But in any case, I suspect that
the hardest part of this will be coping with all the
nested scope issues, and that it will be easiest to do
that while the parse tree still reflects the lexical
structure of the original code

> Consider for instance the "with" statement... the only "natural" 
> way to implement it within the current structure is to implement 
> "WithStatementNode"

Not necessarily. It may well be feasible to implement it
by assembling existing nodes, and I'll be looking into that.

> [1] See ParallellAssignmentNode in Nodes.py for an example -- excellent 
> kludge

How so? It seems like exactly the sort of transformation
you're advocating, the only difference being that it's
implemented by code in Parser.py rather than in a separate
pass.

> the parser has to output stuff in *one certain way*

I'm not sure what you mean by that. The parser doesn't
*have* to perform that transformation -- it's just an
optimisation. It could just leave parallel assignments
alone and let the back end generate a tuple packing and
unpacking operation. It would work, albeit less
efficiently.

If you're referring to the fact that the subnodes of a
ParallelAssignmentNode have to be simple assigments, that's
the whole purpose of the transformation -- to reduce a
parallel assignment to a set of non-parallel ones. If
one of the sub-assignments is itself a parallel assignment,
it gets expanded as well, until only non-parallel
ones are left.

If you're wondering why the ParallelAssignmentNode exists
at all, it's because you can't just do the assignments
one after another -- you have to evaluate all the right
hand sides before assigning to any of the left hand
sides.

It's possible that a more general set of primitive nodes
can be found that enables this to be expressed without
needing a special ParallelAssignmentNode. I may revisit
this after seeing what needs to be done for the with
statement.

> and in a way which is unnatural for the parser, with a "local" view

Not sure what you mean by that, either. The parser has
to know that the ParallelAssignmentNode exists, and what
kind of subnodes it expects. But whatever phase performed
the transformation would need the same knowledge. Neither
of them have to know how those nodes work internally.

Sorry this reply was so long, I didn't have time to
write a shorter one. :-)

-- 
Greg
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Recursive vs. visitor pattern

Reply via email to