Re: [Cython] Refactor or surgery on global constants?

Dag Sverre Seljebotn Sun, 03 Aug 2008 12:24:35 -0700

Stefan Behnel wrote:
> Hi,
> 
> Dag Sverre Seljebotn wrote:
>> In making it possible to have runnable code in pxds, I need to make a 
>> choice. The problem is that interned strings, objects etc. live in 
>> seperate scopes and you get double copies, basically now I get the 
>> following code twice in my c-file:
>>
>> static char __pyx_k___getbuffer__[] = "__getbuffer__";
>> static PyObject *__pyx_kp___getbuffer__;
>>
>> which gcc isn't too thrilled about.
>> [...]
>> 2) Move these things from the scopes to code.global (a global context of 
>> the CCodeWriter). This fits nicely with utility_code going over as well. 
>> This means that for now each scope simply pipe their things into code 
>> where they are merged, while in time then e.g. StringNode could intern a 
>> string during code generation rather than bothering with it during 
>> analysis. (I think this is a viable way forward, but may take an hour or 
>> three longer for me to do.)
> 
> Since I previously worked a bit on the string interning code, I think it would
> really benefit from such a change. Please go ahead.


1) OK I've made some changes in this direction:

- BlockNode now forwards the entries it deals with to Code.GlobalState, 
which tries to do the same thing but filters out duplicates (which comes 
from pxds and pyxes doing the same thing). In time, I expect all of 
BlockNode to  vanish.

- Also, the code dealing with interned strings/ints and cached builtins 
from ModuleNode is moved to Code.GlobalState; which uses a different 
strategy: Rather than iterating over the contents of a scope, it simply 
allocates many CodeWriter instances/buffers and outputs initialiazation 
code at the same time as the declaration code, though to a seperate 
buffer. (One can like or not like this I suppose -- only argument I have 
is it tends to keep code more "together" in one location, but the real 
reason is that it was easier this way.)

- Because one cannot have insertion points within functions (would 
create h). Therefore I extracted some things from the module 
initialization function into two more functions, __Pyx_InitGlobals and 
__Pyx_InitCachedBuiltins. I could potentially get rid of this with a 
special "I promise not to use temps or modify labels"-insertion-point, 
but this was easier. (Also, one could argue for it from a design 
viewpoint: c-file globals are more tied to the c-file as such than 
necesarrily the module -- if a single C file were to implement both a 
module and some of its submodules [1], then cached builtins could be 
shared but the module init func would not be!)

This all runs the test cases etc. and I consider it "done" in some 
sense, but I want to have some feedback before I consider it ready for 
merging.

Of course, it represents a transitional solution. In the end, I expect 
one would add something like this:

self.result_code = code.globalcode.py_string_literal("abc")

and so on (details must be discussed when the day comes), but that, as 
my new mantra has become, "is pending a result_code refactoring". But 
the design now allows for creating such a thing, it would all happen 
within Code.py. (I'll get to that some day of course, especially as Greg 
has a lot of it done, but I have my GSoC priorities to worry about).

[1] (Is this possible in Python? What I've done now opens up for Cython 
supporting this (think compiling a dir tree to a single .so) and get 
some degree of sharing constant literals etc., but that's hardly a 
priority.)

2) During writing this, I realized that some code could be written in a 
shorter fashion:

         (self.initwriter
            .putln("return 0;")
          .put_label(self.initwriter.error_label)
            .putln("return -1;")
          .putln("}")
          .exit_cfunc_scope()
          )

...where I've changed some functions to return "self" (much like C++ 
iostreams). Do you like or dislike this? Consider both with and without 
the indendation-matching-the-C-code. It does create some dissymetry 
(see the put_label where you need a new reference to the writer), this 
could be fixed by adding put_error_label, or by going back

The counterargument is that I could say "c = self.initwriter" first 
instead to get almost the same brevity.

Of course, this is only an added option for new code. So inconsistency 
would be added. Still I'm +1/2 because of when doing all the C code I've 
written in Buffer.py I always thought that it tended to become quite 
long-winded to express what I wanted...

(Some code is using this in my branch which I'll redo if the vote is not 
positive.)

3) For buffers, this means that I've manged to strip all NumPy specific 
code from Buffer.py, instead you currently can define __getbuffer__ and 
__releasebuffer__ in numpy.pxd

I have not introduced a generic "final" mechanism or similar; rather an 
exception is introduced specifically for __getbuffer__ and 
__releasebuffer__ and specifically for the purpose of 
Py2.5-GetBuffer-faking. I'd like to leave it like this for now, but it 
will be a "hidden" feature and I will warn people not to rely on it.

In time I see it like this: A "final" language feature is added, and 
anything declared "final" can reside in pxds. But that is not done now, 
hence the special cases for these two functions.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Refactor or surgery on global constants?

Reply via email to