Re: [Cython] C string literals

Stefan Behnel Mon, 06 Sep 2010 23:46:51 -0700

Robert Bradshaw, 07.09.2010 01:53:
> On Mon, Sep 6, 2010 at 11:20 AM, Stefan Behnel wrote:
>> Robert Bradshaw, 06.09.2010 19:01:
>>> On Mon, Sep 6, 2010 at 9:36 AM, Dag Sverre Seljebotn
>>>> I don't understand this suggestion. What happens in each of these cases,
>>>> for different settings of "from __future__ import unicode_literals"?
>>>>
>>>> cdef char* x1 = 'abc\u0001'
>>
>> As I said in my other mail, I don't think anyone would use the above in
>> real code. The alternative below is just too obvious and simple.
>>
>>
>>>> cdef char* x2 = 'abc\x01'
>>>
>>> from __future__ import unicode_literals (or -3)
>>>
>>>       len(x1) == 4
>>>       len(x2) == 4
>>>
>>> Otherwise
>>>
>>>       len(x1) == 9
>>>       len(x2) == 4
>>
>> Hmm, now *that* looks unexpected to me.
>
> But this *exactly* how Python handles.
>
> x1 = 'abc\u0001'
> x2 = 'abc\x01'
> len(x1), len(x2)
>
> for with and without unicode_literals.


Not for byte strings.

Seriously, what you are trying to push here is that users must decide if 
they prefix a char* literal with a 'b' or not, depending on the content of 
the string. Sometimes, Cython will force them to do it, sometimes, it will 
just work, even for calls to exactly the same function. Great. Why can't we 
*always* require a 'b' or *always* make it work as expected? What would be 
wrong with that?


>> The way I see it, a C string is the
>> C equivalent of a Python byte string and should always and predictably
>> behave like a Python byte string, regardless of the way Python object
>> literals are handled.
>
> Python bytes are very different than strings. C (and most C libraries)
> use char* for both strings and binary data.

No. They use it for binary data and *encoded* text content, even if the 
encoding is ASCII. That's different. The fact that they accept text content 
encoded in ASCII, CP1250, UTF-8, UCS4, Latin-15, Kanji or whatever doesn't 
mean they know what Unicode is or even how to handle text. They may just 
store it away as binary, they may interpret it a filename encoded in a 
platform specific way, or they may pass it to a recoder. Cython can't know. 
The user will know it, though, and will (in almost all cases) pass content 
that suits the other side, be it ASCII encoded or not.

Could you comment on this please?

http://permalink.gmane.org/gmane.comp.python.cython.devel/10243

I think I made it pretty clear there what I think the two suitable 
alternatives are.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] C string literals

Reply via email to