Re: [Cython] Unicode issues

Stefan Behnel Thu, 22 May 2008 08:26:56 -0700

Hi,

Robert Bradshaw wrote:
> On May 20, 2008, at 5:58 AM, Stefan Behnel wrote:
>> it is not immediately
>> visible from the code below that it 1) allocates memory for unicode
>> strings but not for byte strings, 2) garbage collects a temporary  
>> string
>> at some non users configurable point, 3) converts characters to  
>> bytes and
>> thus may fail for some unicode strings and byte strings.
>>
>> This is neither symmetric to the bytes->char* coercion process (which
>> never fails for any kind of byte string), nor is it transparent  
>> that there are non-trivial things happening.
>>
>> "Explicit is better than implicit" definitely holds for everything  
>> that involves non-trivial magic and memory allocation.
> 
> I think points (1) and (2) are non-issues--both Python and Cython/ 
> Pyrex implicitly allocate temporary object all over the place so the  
> user doesn't have to be bothered with it.


I think you are referring to things like adding C numbers to Python numbers in
Python space. That's a trivial case where little memory is involved, and these
objects will be cleaned up almost immediately. Here, we are talking about
duplicating data in memory (a potentially large string) where the user only
asked for a C pointer to it. I find that *very* intransparent.

Currently, this code

   cdef some_c_type var = some_py_value

involves very straight forward coercion code for all types I can think of. If
we allow automatic conversion from unicode to char* in the ASCII case, the
simple statement

   cdef char* s = some_py_value

will require special handling of different Python types (str and bytes, maybe
buffer?) and copying and converting the data in one but not in all cases.
That's really not the same as reading the long value out of a PyInt.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Unicode issues

Reply via email to