Re: [Cython] Recursive vs. visitor pattern

Robert Bradshaw Thu, 22 May 2008 20:16:22 -0700

On May 21, 2008, at 7:16 PM, Greg Ewing wrote:

> Dag Sverre Seljebotn wrote:
>> The big refactoring you refer to is related to
>> seperating the type analysis and type coercion phases
>
> I'm not sure whether you would gain much by doing this.
> The place where you discover that a coercion is needed
> is during type analysis, where you look at the types of
> the operations, decide whether they're compatible, and
> see whether they need to be converted to a common type,
> etc. Also you work out what the result type will be and
> annotate the node with it.
>
> To move the creation of coercion nodes into a separate
> pass, you would have to leave an annotation on the node
> saying that a coercion is needed. But if some phase
> between then and doing the actual coercions rearranges
> the parse tree, these annotations may no longer be
> correct.
>
> So you would have to disallow any other phases between
> type analysis and coercion, or at least put some
> restrictions on what they can do, such as not altering
> the structure of the parse tree, or doing anything
> else that could invalidate the calculated types.


One would require that transformations done at this point leave the  
tree in a correct state. As for the separation of coercion, rather  
than mark a node as needing coercion on the first pass, I would  
decide whether or not it needs coercion (by looking at types) right  
before actually creating the coerce nodes.

> What sort of things were you intending to do in between
> type analysis and coercion? Could they still be done under
> these restrictions?

The phase that I'd like to stick here is type inference. The type  
analysis would type all declared variables, and in some cases assign  
types that are dependent on other (yet unknown) types. One would then  
run a type-resolution algorithm on the data of the symbol table,  
which could be used to actually resolve all the types.

> More generally, sometimes trying to split things up into
> more phases can make things more complicated rather than
> simpler. As an example of this, I'm currently thinking
> about eliminating the allocate_temps subphase of expression
> analysis and combining it with the code generation phase.
>
> The reason is that there's currently a rather non-obvious
> dependency between these phases. The order in which temp
> variables are allocated and released during allocate_temps
> has to exactly match the order in which code is generated
> that creates and disposes the references that will be
> put in those temps. This makes it rather tricky to both
> write and maintain code for these two phases.
>
> The reason they're separate phases at the moment is that
> I was initially writing the generated code directly to the
> output file, so I had to know what temp variable declarations
> would be needed before starting to write any of the
> body code for a function. However, I'm currently writing
> the declaration and executable code to separate buffers and
> combining them afterwards, so there's probably no need for a
> separate allocate_temps pass any more, and combining it
> with code generation is likely to simplify quite a lot
> of things.

This actually sounds like a very good idea.

>
>> It would be better (see below) to have "deeper" cuts, i.e. so that  
>> one
>> could say "now the entire tree has been analysed", "now the entire  
>> tree
>> is ready for code generation", rather than some parts (functions  
>> etc.)
>> being in a seperate state.
>
> The reason for doing functions that way is that it seemed
> wasetful to keep all the symbol tables for the local
> scopes around longer than necessary.
>
> That decision was probably influenced by an earlier project
> in which I wrote a compiler for a Modula-like language that ran
> on machines much smaller than we have today. It used a 3-pass
> arrangement that kept all the symbol tables for everything
> between passes, with the result that it could only compile
> a module a few hundred lines long before running out of memory.
>
> That experience gave me an appreciation of why Wirth prefers
> to write single-pass compilers.
>
> Although the memory issue probably isn't a concern nowadays, the
> aforementioned experience led me to approach the problem with the
> mindset of using as few passes as possible. The only reason I used
> separate analysis and generation phases at all in the beginning
> was so that you can refer to C functions that are defined further
> down without needing forward declarations.

Certainly memory (at this level) isn't near as tight as it once was.  
One advantage of lots of passes is the ability to more easily insert  
(optional) optimization passes.

>> Consider for instance trivial inner functions. One
>> natural way to implement this is just "throwing the function out"
>> ...
>> so you *somehow* need an ugly kludge to
>> get around this, and spend time thinking about that
>
> I don't think it would be all that difficult to make a pass
> over the function body just before generating code for it,
> that generates code for any nested functions.
>
> In fact, I have a suspicion that for the special case you're
> talking about (no references to intermediate scopes) it would
> "just work", because the analysis phase will already have
> generated an appropriately-mangled C name for the function.
>
> On the other hand, "throwing the function out" presents
> difficulties of its own. As you mentioned, some kind of name
> mangling would need to be done, which requires knowing which
> names are supposed to refer to that function, so you need
> some kind of symbol table functionality available in the
> pass where you do the throwing-out.
>
> The obvious thing is to use the real symbol table, but
> that means doing it after the declaration analysis phase,
> by which time the only problem remaining is how to get the
> C code generated in the right place -- which as I've
> said isn't really all that hard.

+1 I think "throwing the function out" presents more difficulties  
than handling it right before the code generation phase. Conceptually  
it would be easier to say things like "all declarations in the tree  
has now been processed" though.

>> (In reality there'll be full closures instead, which just means
>> generating a "cdef class" (with state) at module scope rather than a
>> function. I'd like to see code doing that in less than 150 lines  
>> without
>> using any of my "new structure" proposals...)
>
> I'll be impressed if you can do it in 150 lines whatever
> structure you're using. But in any case, I suspect that
> the hardest part of this will be coping with all the
> nested scope issues, and that it will be easiest to do
> that while the parse tree still reflects the lexical
> structure of the original code
>
>> Consider for instance the "with" statement... the only "natural"
>> way to implement it within the current structure is to implement
>> "WithStatementNode"
>
> Not necessarily. It may well be feasible to implement it
> by assembling existing nodes, and I'll be looking into that.

Yes, this would basically be manually implementing the transformation  
in the parser module.

- Robert



_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Recursive vs. visitor pattern

Reply via email to