On 5/14/08, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> [patch moved here]
>
> > --- a/Cython/Compiler/Nodes.py Tue May 13 23:41:11 2008 +0200
>
> > +++ b/Cython/Compiler/Nodes.py Wed May 14 12:45:16 > Can you tell
> me why? This actually creates strings as unicode strings that
> were byte strings in the source code.
It was just a try, because I could not import my extension at the time
I wrote that.
> The byte strings are interned in Py2, where identifiers are byte strings, and
> the unicode strings are interned in Py3, where identifiers are unicode
> strings.
OK,
> > should Cython save (in the Py3 case)
> > byte strings in their internal table?
>
> We can't recognise the Py3 case at Cython compile time. Only the C compiler
> knows what the target environment is.
I'll reformulate the question. Why byte/unicode string literal are not
going to be managed inside the same table that the strings for
identifier names? Why not treat them in the same way as interger
literals?
Wait a minute!! Now I see, you are clever... if we do
D = {}
D["abc"] = 1
v = D["abc"]
then iff the "abc" literal was interned, then the lookup in the last
line will benefit for the interning. So yes, you are right, it DO make
sense to intern string literals as much as identifier names...
> That's what I was considering, too, although not quite as you describe. The
> difference between the two is that the real identifiers must become either
> byte strings or unicode, depending on the compile time Python version. The
> normal strings must be created as they appeared in the source code, and
> either
> unicode strings or byte strings can be interned based on the compile time
> Python version. So I now added another field to the string tab that states if
> the string was interned as identifier, which will then make it pop up as
> either unicode or byte string depending on the C compile time Python version.
OK, this seems now to me the right way.
> This (plux a fix for importing based on unicode module names) even gets
> almost
> all test cases green, just a few to go. :)
A now all is working for me with current cython-devel-py3 repo!!!,
except in parts that are my fault because of poor string handling.
> One big remaining problem are the PyFile_* functions - as used in "print". I
> guess we'll have to wait for 3.0b1 here to provide a fix.
I have not looked at this yet.
> Another thing is the removal of __setslice__ and __delslice__, i.e. the
> sq_ass_slice slot. This means that code that uses these will no longer
> compile. I added a warning, but there are two test cases that depend on it.
> We
> might want to remove them.
I would not worry too much about them, Look at
http://docs.python.org/ref/sequence-methods.html. They are deprecated
since Python release 2.0 . Or perhaps Cython should transform them and
define __getitem__/__setitem__/__delitem___ . And if both variant are
implemented, generate a warning (perhaps an error?) EVEN in the Py2
case, as there is no point on defining both.
> I tested the generated code under Py2.3.6, Py2.4.4, Py2.5.1 and Py3.0a5 now.
> Except for a couple of remaining problems in Py3, it still works in all of
> them. :]
Almost all is working for me now on Py2/Py3, too. However, I suspect
that the new method cache in type objects (Py2.6 and Py3.0) is playing
bad with the code Cython generates, but only in the cases Cython play
games with the 'tp_dict' field of type objects...
Regards, and let me say you have done pretty good work on all this...
--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev