Re: [Cython] Temp allocation flow

Dag Sverre Seljebotn Sat, 29 Nov 2008 00:51:03 -0800

Thanks a lot for chiming in -- there's always the danger of running into 
the same dangers (that we don't see) that you already avoided.

Greg Ewing wrote:
> Be careful here -- you can't just stuff the result
> straight into the lhs. You need to evaluate the rhs,
> make sure you have a new reference to it, decref the
> lhs, and then do the assignment, in that order. So
> you need a temp for the rhs anyway, if it's anything
> other than a bare name.

Good point.

It means that my scheme would likely also need 
"node.can_raise_exception" or similar, if one wants to optimize when the 
rhs is a NameNode.

(If the rhs is a pure C statement it is also OK to decref the lhs before 
evaluation of rhs, but not sure if I'd bother with it, one can always be 
lazy and let can_raise_exception stay at a default True).

> I can't really see what you're expecting to gain from
> this change, over and above what you'll get simply by
> moving temp allocation to code generation time. At best
> you'll move some of the generic code from one place to
> another; at worst you'll end up duplicating it in many
> parent nodes.

First of all, if I can get temp allocation moved to code generation time 
I am mostly happy, no matter how it is done. But Stefan noted some 
problems in doing that, and I wondered if this approach would perhaps 
make that move quicker and easier. Wondered, I'm not sure yet.

However I also feel it might in general fit better within that framework 
(temps during code generation), where temps are looked at more as 
registers that are allocated and deallocated while outputting code 
statements.

Case study: The snippet below (and I must say I am really thankful for 
the extensive comments)

     def allocate_temps(self, env, result_code = None):
         #  We only ever evaluate one side, and this is
         #  after evaluating the truth value, so we may
         #  use an allocation strategy here which results in
         #  this node and both its operands sharing the same
         #  result variable. This allows us to avoid some
         #  assignments and increfs/decrefs that would otherwise
         #  be necessary.
         self.allocate_temp(env, result_code)
         self.test.allocate_temps(env, result_code)
         self.true_val.allocate_temps(env, self.result())
         self.false_val.allocate_temps(env, self.result())
         #  We haven't called release_temp on either value,
         #  because although they are temp nodes, they don't own
         #  their result variable. And because they are temp
         #  nodes, any temps in their subnodes will have been
         #  released before their allocate_temps returned.
         #  Therefore, they contain no temp vars that need to
         #  be released.

would be removed, and instead it would be put into the code generation 
flow itself (! marks new lines)

     def generate_evaluation_code(self, code):
!       tmp = code.funcstate.allocate_temp(...etc...)
!       self.test.set_target(tmp)
         self.test.generate_evaluation_code(code)
         code.putln("if (%s) {" % tmp )
!       self.true_val.set_target(self.target)
         self.true_val.generate_evaluation_code(code)
         code.putln("} else {")
!       self.false_val.set_target(self.target)
         self.false_val.generate_evaluation_code(code)
         code.putln("}")
         self.test.generate_disposal_code(code)
!       code.funcstate.release_temp(tmp)

I find the latter (interleave the temps) more easy to write than having 
a seperate phase, but that could be a matter of taste.

Without doing this, and *only* move the temp allocation phase to code 
generation time, the former code (the seperate allocate_temp phase) is 
still needed and in fact many of the benefits with temps at generation 
time disappear.

> The only possible benefit I can think of is that it
> might eliminate the need for CloneNodes. But I'm not
> so sure about that -- you'll still need some way of
> differentiating between the "owner" of a node and
> nodes that simply use its value, so you don't end up
> trying to generate evaluation code for it more than
> once.

CloneNodes-situations should now usually be dealt with by using 
TempsBlockNode instead. But of course, one wouldn't like to have to do 
everything -- rebuilding everything is a real danger and I haven't 
finally settled on +1-ing my own approach yet, but I want to investigate it.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Temp allocation flow

Reply via email to