On 5/19/08, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> I actually like the way it's in Py3. Unicode is the right thing most of
> the time - except when you deal with C-APIs as in Cython, where the best
> place to handle unicode is right below the API level, and nowhere else in
> your code. :)
Yep, that make a lot of sense, you are definitely right here...
> > Guido decided Python 3 not to
> > support the u"abc" form.
>
> It makes writing portable code very hard, just think of code that must
> support Python 2.3-3.0. I'm currently wrapping all string literals in the
> test cases in lxml with a function call _bytes() or _str(), which then
> does the right thing depending on the runtime environment. But it's a
> whole bunch of work to manually put this all over the place...
Indeed. And I'll probably use the same 'trick' you are using (but
perhaps implement it using Python C-API)
> I could agree on automatic promotion of docstrings and maybe even
> exception messages to unicode strings, but such a selective automatism
> would be somewhat surprising to users. And I'm a big fan of "explicit is
> better than implicit".
You already convinced my that automatic promotion is a really bad
idea, even in those 'special' cases
> Right, this actually currently works (sort-of) in Py2:
>
> cdef char* val
> uval = u"abc"
> val = uval
> print repr(val)
>
> prints 'abc' in Py2 and raises a TypeError in Py3. If you use non-ASCII
> letters, however, this fails with a UnicodeDecodeError in Py2. It would
> really be better if Cython catched that for the literal case and raised at
> least a runtime TypeError in the case above. And I mean: always, not just
> with a command line switch. As this will really help users by showing them
> where work has to be done.
Perhaps this stricter way will be really helpfull. So I'm +1 on it.
Still, providing a non-programatic way of DISABLE this check would
also be needed, just for Cython backward compatibility in Cython
targeting users that are not ready for fixing their Py2.X codes.
> > * A new C pseudo-type have to be added, lets call it 'uchar' (better
> > name would be needed, it can be confused with unsigned char). > I assume
> you mean a conversion to UTF-8 here, in which case "utf8char"
> would be appropriate
Yes, I meant a conversion to UTF-8.
IMHO. Still, I find
>
> s.encode("UTF-8")
>
> so short and explicit, that I don't see a major need for a special type
> name here. And in many, many cases, you will even be able to say
>
> def dostuff(text):
> cdef char* c_s
> text = text.encode("UTF-8")
> c_s = text
> ...
>
> so you don't even need to care about GC or anything, as "text" will stay
> alive during the function call.
It's shorter, but 'cdef utf8char* c_s = text' is even shorter and
explicit as well, and it can me implemented with pure Python C-API
calls. But
> Regarding a policy, I have decided to get lxml's Cython code clean and the
> Python code portable without 2to3. That's more work than trying to find
> ways to cheat, but it's the right thing to do, and the safest option.
I second you here. I hope at some point something like a 3to2 (note
the reversed number) tool is provided. As Python 3 is cleaner and
stricter, perhaps gowing from 3 sintax and semantics to the 2 one will
be easier, and then, in such scenario, we can writte code directly for
Python 3 and make it backward compatible with Py2.X series.
Regards,
--
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev