On 5/19/08, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> I actually like the way it's in Py3. Unicode is the right thing most of
>  the time - except when you deal with C-APIs as in Cython, where the best
>  place to handle unicode is right below the API level, and nowhere else in
>  your code. :)

Yep, that make a lot of sense, you are definitely right here...

>  > Guido decided Python 3 not to
>  > support the u"abc" form.
>
> It makes writing portable code very hard, just think of code that must
>  support Python 2.3-3.0. I'm currently wrapping all string literals in the
>  test cases in lxml with a function call _bytes() or _str(), which then
>  does the right thing depending on the runtime environment. But it's a
>  whole bunch of work to manually put this all over the place...

Indeed. And I'll probably use the same 'trick' you are using (but
perhaps implement it using Python C-API)

>  I could agree on automatic promotion of docstrings and maybe even
>  exception messages to unicode strings, but such a selective automatism
>  would be somewhat surprising to users. And I'm a big fan of "explicit is
>  better than implicit".

You already convinced my that automatic promotion is a really bad
idea, even in those 'special' cases

> Right, this actually currently works (sort-of) in Py2:
>
>     cdef char* val
>     uval = u"abc"
>     val = uval
>     print repr(val)
>
>  prints 'abc' in Py2 and raises a TypeError in Py3. If you use non-ASCII
>  letters, however, this fails with a UnicodeDecodeError in Py2. It would
>  really be better if Cython catched that for the literal case and raised at
>  least a runtime TypeError in the case above. And I mean: always, not just
>  with a command line switch. As this will really help users by showing them
>  where work has to be done.

Perhaps this stricter way will be really helpfull. So I'm +1 on it.
Still, providing a non-programatic way of DISABLE this check would
also be needed, just for Cython backward compatibility in Cython
targeting users that are not ready for fixing their Py2.X codes.


>  > * A new C pseudo-type have to be added, lets call it 'uchar' (better
>  > name would be needed, it can be confused with unsigned char). > I assume 
> you mean a conversion to UTF-8 here, in which case "utf8char"
>  would be appropriate

Yes, I meant a conversion to UTF-8.


IMHO. Still, I find
>
>      s.encode("UTF-8")
>
>  so short and explicit, that I don't see a major need for a special type
>  name here. And in many, many cases, you will even be able to say
>
>    def dostuff(text):
>      cdef char* c_s
>      text = text.encode("UTF-8")
>      c_s = text
>      ...
>
>  so you don't even need to care about GC or anything, as "text" will stay
>  alive during the function call.

It's shorter, but 'cdef utf8char* c_s = text' is even shorter and
explicit as well, and it can me implemented with pure Python C-API
calls. But


> Regarding a policy, I have decided to get lxml's Cython code clean and the
>  Python code portable without 2to3. That's more work than trying to find
>  ways to cheat, but it's the right thing to do, and the safest option.

I second you here. I hope at some point something like a 3to2 (note
the reversed number) tool is provided. As Python 3 is cleaner and
stricter, perhaps gowing from 3 sintax and semantics to the 2 one will
be easier, and then, in such scenario, we can writte code directly for
Python 3 and make it backward compatible with Py2.X series.

Regards,


-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to