Re: [Cython] Unicode issues

Robert Bradshaw Wed, 21 May 2008 13:33:38 -0700

On May 20, 2008, at 5:58 AM, Stefan Behnel wrote:

> Greg Ewing wrote:
>> Stefan Behnel wrote:
>>> b) this makes it very easy to write
>>> buggy code that works perfectly until someone passes non-ASCII
>>> characters.
>>
>> That's what I don't follow. Code such as
>>
>>    cdef char *p
>>    p = s.encode('ascii')
>
> Currently, this would rather be
>
>      cdef char *p
>      b = s.encode('ascii')
>      p = b
>
>> has exactly the same property, as far as I can see -- it
>> works until someone passes it non-ascii characters. I
>> would call it a limitation rather than a bug.
>
> The difference to this code
>
>     chef [u]char* p
>     p = s
>
> is that the code above does an explicit conversion to a user-defined
> encoding and makes clear what happens when, wheres it is not  
> immediately
> visible from the code below that it 1) allocates memory for unicode
> strings but not for byte strings, 2) garbage collects a temporary  
> string
> at some non users configurable point, 3) converts characters to  
> bytes and
> thus may fail for some unicode strings and byte strings.
>
> This is neither symmetric to the bytes->char* coercion process (which
> never fails for any kind of byte string), nor is it transparent  
> that there
> are non-trivial things happening.
>
> "Explicit is better than implicit" definitely holds for everything  
> that
> involves non-trivial magic and memory allocation.


I think points (1) and (2) are non-issues--both Python and Cython/ 
Pyrex implicitly allocate temporary object all over the place so the  
user doesn't have to be bothered with it. That leaves (3), the  
question of whether to allow any implicit string <--> char*  
conversions (because it's so convenient, especially given the  
ubiquitous nature of ASCII) or not (because if it's not explicit,  
it's a bug).

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Unicode issues

Reply via email to