Re: [Cython] first lessons learned while porting lxml to Py3

Stefan Behnel Mon, 19 May 2008 14:58:21 -0700

Hi,

Lisandro Dalcin wrote:
> On 5/19/08, Stefan Behnel <[EMAIL PROTECTED]> wrote:
>>  > Guido decided Python 3 not to
>>  > support the u"abc" form.
>>
>> It makes writing portable code very hard, just think of code that must
>>  support Python 2.3-3.0. I'm currently wrapping all string literals in the
>>  test cases in lxml with a function call _bytes() or _str(), which then
>>  does the right thing depending on the runtime environment. But it's a
>>  whole bunch of work to manually put this all over the place...
> 
> Indeed. And I'll probably use the same 'trick' you are using (but
> perhaps implement it using Python C-API)


Hmmm, but you don't have to do that in Cython code (at least the way it's
currently implemented). String literals do not change semantics there, so if
you use the correct string types everywhere, the generated C code will just
work unchanged in Py2 and Py3.

They do, however, change in unmodified Python code - which is a pitty if you
really have to test byte strings and unicode strings, and can't prefix the
first with 'b' as your code has to run in 2.3...


>> Still, I find
>>      s.encode("UTF-8")
>>
>>  so short and explicit, that I don't see a major need for a special type
>>  name here. And in many, many cases, you will even be able to say
>>
>>    def dostuff(text):
>>      cdef char* c_s
>>      text = text.encode("UTF-8")
>>      c_s = text
>>      ...
>>
>>  so you don't even need to care about GC or anything, as "text" will stay
>>  alive during the function call.
> 
> It's shorter, but 'cdef utf8char* c_s = text' is even shorter and
> explicit as well, and it can me implemented with pure Python C-API
> calls.

But it only works in Py3 as is. In Py2, Cython will have to do the entire
magic, including the automatic cleanup of the UTF-8 encoded string. Here, this
is comparable to this:

    cdef char* s = "abc" + some_string

for which Cython currently raises a compiler error as you take the pointer to
a temporary variable. So this will have to be changed first, before the
coercion feature can be enabled for unicode strings in both Py2 and Py3.


>> Regarding a policy, I have decided to get lxml's Cython code clean and the
>>  Python code portable without 2to3. That's more work than trying to find
>>  ways to cheat, but it's the right thing to do, and the safest option.
> 
> I second you here. I hope at some point something like a 3to2 (note
> the reversed number) tool is provided. As Python 3 is cleaner and
> stricter, perhaps gowing from 3 sintax and semantics to the 2 one will
> be easier, and then, in such scenario, we can writte code directly for
> Python 3 and make it backward compatible with Py2.X series.

Yep, there was some discussion on the Py3k list on this. No idea what became
of it...

Stefan


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] first lessons learned while porting lxml to Py3

Reply via email to