Re: [Cython] Temp framework proposal

Dag Sverre Seljebotn Thu, 24 Jul 2008 07:32:34 -0700

I was thinking in very similary ways to what you suggest in order to
arrive at this proposal, so this ought to be interesting (and it is making
me see some weaknesses in my proposal too).

> Dag Sverre Seljebotn wrote:
>> I believe the following
>> temp refactoring is a) conceptually simple and not a big break with
>> current ways of doing things, b) can be made very incrementally (though
>> there may be inconsistencies in "how things are done" along the way), c)
>> is useful (at least to me).
>
> I don't think it should be necessary to go to this
> much upheaval in order to make temps easier to use.
> All you should really need is some suitable node types
> to use as building blocks during your tree transformation
> phases.
>
> I'm thinking of something like the following:
>
> * LocalNode - an ExprNode representing a temporary local
>    variable that is allocated using the temps mechanism.
>
> * LetNode - a StatNode that holds a list of LocalNodes
>    and a StatListNode. It functions like a "let" statement
>    in Lisp, i.e. binds some local variables, executes the
>    body and then disposes of the locals.

I was thinking about LetNode too, which I think is definitely needed --
but I think it is orthogonal as such; the issue is that we are discussing
how such a LetNode (and everything else using temps) should be
implemented.

Because LetNodes can be introduced after what is currently the
allocate_temps phase, the allocate_temps phase must then be moved (if we
keep it; I'm basically saying "turn it into a pre-generation transform).
But then I have a feeling that a lot of code will follow this pattern:

- During analysis/transformation, you know "what you are doing" and how
many temps you will need.
- Saved this information for later in various ways.
- Read this information and allocate temps in the allocate_temps phase.

This seems like a reusable pattern that could eliminated with another
design. LetNode would be one particular instance of this pattern; with
your suggestion it would have "custom" (non-reusable, in one sense of the
word) code to remember which temps it should allocate; then other custom
nodes have custom code again to pull them out. In a similar way, ExprNode
would need to remember that it should allocate its result as a temporary,
and so on.

This "duplication" won't disappear overnight with my approach, but I think
it opens up for going away from it in another way.

I am even so still considering your approach! -- just noting my arguments
against it. A big weakness with my approach is this: Consider a node that
does something like the following:

process the A node
output calculation code involving a temporary
process the B node

My proposal would then assume that the temp is needed for the whole thing.
So perhaps an imperative approach is right afterall. Even though it is
always possible to create really complicated (but *optional*) declarations
(like "keep these temps during A and these during B and these for local
processing and these for the result"; but imperative tend to express this
more nicely).

An imperative approach (still having methods in each node calling
allocate/release) for "fixing/anchoring" the entries, but still being able
to get hold of entries rather than strings (at a stage earlier than the
anchoring occurs) might be a compromise.

> The code for the LetNode would work in a similar way to
> the existing code in the for-statement node and other
> places that use temporary locals, except that instead of
> a fixed set of locals, it would do it for whatever was in
> its list of LocalNodes.
>
> In the body of the LetNode, wherever you wanted to refer
> to one of the locals, you would insert a CloneNode (an
> already-existing node type) that references the relevant
> LocalNode.

Entry seems like a more natural concept for this to me -- because it is a
more "refined", more basic concept of a handle to a variable in the scope.
This could then be used in many situations. For this case, I would use
NameNode. It already accesses lots of different symbols; I have looked
into making some trivial changes to NameNode so that if you provided
node.entry on construction it wouldn't go looking for its node.name, so
you basically do this to, say, unpack a cascaded assignment (node is the
cascaded assignment, and this is psuedocodes, the details are not right):

def visit_CascadedAssignmentNode(self, node):
  let = LetNode(temp_types=[node.rhs.type])
  tmp = let.entries[0]
  let.body = StatementListNode(stats=[
    SingleAssignmentNode(lhs=NameNode(entry=tmp), rhs=node.rhs)
  ] + [
    SingleAssignmentNode(lhs=lhs, rhs=NameNode(tmp))
    for lhs in node.lhses
  ])
  return let

Note that one reason for this is that if we start transforming into more
basic operations (consider this hypothethical!), then one thing one might
want to do is unrolling nested expressions. So you have a transform that
basically unpacks every nested expression into assignments to temporaries
(this might be a good pre-step to some code analysis-algorithms that would
benefit from being able to insert if-statements in the middle of nested
expressions for instance).. For the "primitive" statements that are left I
would then consider it unnatural that NameNode, CloneNode and so on where
exceptions to the non-nesting rule, rather each statement would contain
references to entries rather than expressions.

Again, this was just designed to push your approach a bit. Getting
something that
>> I think it
>> is ok if we agree that result_code should be refactored at some later
>> point if one gets the time and resources; i.e. one stores "result_entry"
>> and other entry references instead at analysis time and only calculate
>> result_code at code generation time).
>
> You really ought to look at how this is in the current
> Pyrex before doing anything about this. The result code
> is now retrieved by calling result_as(), or something else
> that ends up calling it, during code generation. So there's
> no longer a requirement to calculate the result code at
> analysis time -- it can be constructed on the fly at code
> generation time if need be.

True, but you then need to remember what to generate, as I said above!

But I will definitely not touch result_code without looking at what you
have done. Do you have any "commits" or similar that would contain these
changes relatively self-contained, or would the best thing to do be taking
a diff between different Pyrex versions?

> This means it would, I think, be fairly straightforward
> now to merge all the temp allocation/release stuff into
> the code generation phase, although I haven't looked in
> detail at what would be required.

Unless you cache the function bodies it would need to be done right prior,
so that you know which local variables to declare. If it weren't for that,
imperative temp allocation interleaved with code.put-statements wouldn't
be that bad...

Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Temp framework proposal

Reply via email to