Re: [Cython] Unicode issues

Stefan Behnel Tue, 20 May 2008 05:59:05 -0700

Greg Ewing wrote:
> Stefan Behnel wrote:
>> b) this makes it very easy to write
>> buggy code that works perfectly until someone passes non-ASCII
>> characters.
>
> That's what I don't follow. Code such as
>
>    cdef char *p
>    p = s.encode('ascii')


Currently, this would rather be

     cdef char *p
     b = s.encode('ascii')
     p = b

> has exactly the same property, as far as I can see -- it
> works until someone passes it non-ascii characters. I
> would call it a limitation rather than a bug.

The difference to this code

    chef [u]char* p
    p = s

is that the code above does an explicit conversion to a user-defined
encoding and makes clear what happens when, wheres it is not immediately
visible from the code below that it 1) allocates memory for unicode
strings but not for byte strings, 2) garbage collects a temporary string
at some non users configurable point, 3) converts characters to bytes and
thus may fail for some unicode strings and byte strings.

This is neither symmetric to the bytes->char* coercion process (which
never fails for any kind of byte string), nor is it transparent that there
are non-trivial things happening.

"Explicit is better than implicit" definitely holds for everything that
involves non-trivial magic and memory allocation.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Unicode issues

Reply via email to