Re: [Cython] Another string encoding idea

Robert Bradshaw Thu, 03 Dec 2009 10:28:38 -0800

On Dec 3, 2009, at 6:41 AM, Lisandro Dalcin wrote:

> On Thu, Dec 3, 2009 at 4:50 AM, Stefan Behnel <[email protected]>  
> wrote:
>>
>> Robert Bradshaw, 03.12.2009 02:01:
>>> On Dec 1, 2009, at 12:56 AM, Stefan Behnel wrote:
>>>> We could have a "cython.str()" function that converts char* 
>>>> +length or
>>>> a char* buffer to bytes or unicode depending on the platform and  
>>>> using
>>>> either the platform encoding or a different one passed as argument.
>>>> So you'd return "cython.str(c_string, length)" (or  
>>>> "cython.str(s)" for
>>>> the example above) and be happy.
>>>
>>> That's a good idea, and should probably go in regardless of whatever
>>> else happens.
>>
>> Ok, so then we have three different cases for the char*->Python path:
>>
>> 1) create bytes - that's what currently happens automatically
>>
>> 2) create unicode - easy to do with "s.decode(enc)"
>>
>> 3) create str (i.e. bytes in Py2 and unicode in Py3) - easy to do  
>> with a
>> future "cython.str(s)" or "cython.str(s[:length])", optionally  
>> taking an
>> encoding as second argument and defaulting to the platform encoding  
>> otherwise.


Yep.

>
> And you forgot mapping NULL to None...

Oh... that's a nice thought as well.

>
>>
>> I think all of these are easy enough to type and read. So isn't  
>> that all we
>> need for that direction?
>>
>
> For brand-new code, I think you are right.
>
>> Or is it really the encoding name that you want to
>> keep users from typing?

Not at all. I'm up for a default, but that's not my issue. The  
motivating issue is that the user has to manually do something /every  
time/ a char* is converted. I would guess in most (almost all)  
applications one wants the same behavior throughout a whole module.

> I think Robert's concern is the HUGE amount of code that should have
> to be reviewed/modified in SAGE.

Yes, that is a big concern, though of course this would be a one-time  
patch that someone could just sit down for X hours and do. Of course  
then new code (not written or refereed by me) would go in, leak bytes  
objects when that probably wasn't intended when we finally migrate to  
Py3 (we depend on a lot of upstream projects doing so first), and down  
the road get reported as a bug and (finally) be corrected.

One could argue that one should train all developers, and add making  
sure things are handling char* conversions correctly as part of the  
referee process, etc. but there's a huge number of things for new  
developers to learn and get familiar with already. Also I strongly  
believe the maxim that changing a system is vastly easier than  
changing (and maintaining) human behavior. Even if everyone did this,  
in our case it would be 98% busywork for our project, I'd rather  
developers and referees spend their limited time thinking about more  
relevant things. I would guess many other projects are in the same  
boat, and will be surprised when they try to run their code under Py3  
and all of the sudden bytes objects are returned all over.

I see a (weak) analogy to memory management. For some usecases,  
manually managing memory is important. For others, it's unneeded  
bookkeeping that the developer would be better off not having to think  
about at every step.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Another string encoding idea

Reply via email to