Re: [Cython] Recursive vs. visitor pattern

Dag Sverre Seljebotn Wed, 21 May 2008 08:35:27 -0700

Martin C. Martin wrote:
> Hi all,
>
> Recently there was some mention of using the visitor pattern to visit 
> every node, versus a recursive overloaded function.  What do people see 
> as the advantage of one over the other?  The recursive way seems 
> simpler, but recursion in general is confusing to many, so the visitor 
> pattern might be easier to understand.  Is there any other trade off?
>   
Thanks for asking :-) I'm the one who's been pushing for this, and I 
believe I have good reasons.


I'll need to break the question up in pieces. (I've been moving so I'll 
be unresponsive, but when I answer I'll write proper answers).

First off, this is NOT about recursive vs. non-recursive -- when using 
the visitor pattern, one does of course have to care deeply about 
recursiveness. That doesn't matter at all, if recursion is something one 
actually have to stop and think about one shouldn't touch Cython code 
(like Greg says). So this is just confusion coming from me using 
inprecise language. Sorry.

An answer to Stefan: The big refactoring you refer to is related to 
seperating the type analysis and type coercion phases (both happening in 
analyse_phase currently). Do you have any other, concrete suggestions 
about how to do this? How can one implement, say, NumPy support or type 
inference, without it (in ANY way -- i.e. there's a "logical proof" that 
any language feature which would allow the NumPy functionality I'll be 
implementing will need this refactoring). Of course, how the refactoring 
happens depend on how *new* code should be written.

Back to the approach I'm pushing for. It actually has two orthogonal parts:

1) More "phases", and "deeper cuts" in the recursive process (ie have 
many recursive processes with simpler flow, rather than one big with 
complex flow).

More phases is the important part, like noted above a lot of stuff 
simply cannot be done until type analysis and type coercion is split 
into seperate phases. This is the only nasty refactorting job in my 
proposals.

As for deeper cuts: Parts of what the current recursive call chain does 
can be scetched out like this:

analyse_module_level
generate_module_code
   for each function:
      analyse_function_code
      generate_function_code

It would be better (see below) to have "deeper" cuts, i.e. so that one 
could say "now the entire tree has been analysed", "now the entire tree 
is ready for code generation", rather than some parts (functions etc.) 
being in a seperate state. It's just easier conceptually to always know 
what state one is in and be sure that what you need is generated before 
you need it.

I'll give an example: Consider for instance trivial inner functions. One 
natural way to implement this is just "throwing the function out", ie 
extract it and put it in module scope (with name mangling etc.), and 
then let the rest of the system deal with it. That way, you don't need 
to have any specific code for inner functions anywhere but one place. 
However, if you detect the types etc. you need for the inner functions 
in line 4 in the above psuedo-code, then you cannot depend on 
analyse_module_level to do anything about your new global function, 
because that's already been run; so you *somehow* need an ugly kludge to 
get around this, and spend time thinking about that (I have no idea how 
one would do it) rather than doing "real work" which doesn't come as a 
penalty from how the program is structured.

(In reality there'll be full closures instead, which just means 
generating a "cdef class" (with state) at module scope rather than a 
function. I'd like to see code doing that in less than 150 lines without 
using any of my "new structure" proposals...)

2) Transforms. Since it's you Martin, think of them as internal Cython 
macros on the tree. Rather than implementing everything parser-end to C 
generation end, it should be possible to implement one feature by 
writing a simple macro to transform it into other things.

With the current approach, the big problem is that the structure of the 
tree is fixed (of course it is possible to work around it; but this is 
not done in practice because it is so kludgy). And the problem with this 
is: Python code and C does not map 1:1 in structure. So the result is 
rather lots of kludges to get around this [1], and trivial things 
suddenly becomes non-trivial.

Consider for instance the "with" statement. It's PEP specification is 
given as an equivalent Python code fragment! Still, the only "natural" 
way to implement it within the current structure is to implement 
"WithStatementNode" and keep it around in the tree (in some form) all 
the way until C code generation.

With transforms, you'd rather write a (recursive!) macro to handle with 
statements (simply turn it into the equivalent Python fragment). With my 
proposals there'd be a top-level "pipeline" list like this:

pipeline = [parse, type_analysis, coerce, generate_C] # very simplified

Now, you can simply do

if should_support_with_statemements:
    pipeline.insert(1, with_statement)
else:
    pipeline.insert(1, raise_error_on_with_statement)

And 1) the remaining phases (type analysis etc. etc.) never even need to 
know that there exists a WithStatementNode, 2) the parser doesn't need 
to care about it either above simply "turning the Cython code into an 
object representation" (it doesn't need any "business logic" for the 
with statement).

What does the visitor pattern have to do with any of this? It's the most 
natural way to implement stuff like this. But important: This is not the 
important part. Visitors is really "low-level", it's sort of like "do 
you prefer for-loops of while-loops for this kind of task", while the 
stuff mentioned above is more like "do you prefer a microkernel or a 
monolithic kernel"... :-)

Example: Without visitors, in order to implement a Cython code 
serializer (to reserialize the tree) one would have a method 
"serialize_to_cython" to every node class. (Imagine then coming along to 
add "serialize_to_xml" later...). With visitors, one simply creates an 
isolated CodeWriter.py file, and nothing else knows anything about it -- 
it allows clear dependencies and compartamentalization.

I can write more on this later but this should do for now, tell me which 
parts you're interested in if any...

[1] See ParallellAssignmentNode in Nodes.py for an example -- excellent 
kludge that works nicely and comes naturally from the current code 
structure, but a kludge none-the-less. In particular, the parser has to 
output stuff in *one certain way* (and in a way which is unnatural for 
the parser, with a "local" view) because of what happens many miles down 
the road. Design problems like this slows down development and makes the 
probability of bugs higher.

(Rewrites has an even higher bug probability, though! So I'm not 
advocating that -- I'm simply talking about how I'd like to write *new* 
code!)

Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Recursive vs. visitor pattern

Reply via email to