On Dec 12, 2009, at 3:19 AM, Nathaniel Smith wrote:

> On Sat, Dec 12, 2009 at 1:51 AM, Stefan Behnel <[email protected]>  
> wrote:
>> Nathaniel Smith, 12.12.2009 10:05:
>>> After upgrading to Cython 0.12 today (Python 2.5.2, x86-64, linux),
>>> some code of mine broke. Specifically, it's code for reading a  
>>> binary
>>> format, and in the tests I had a string that made Cython fail to
>>> compile with the error:
>>>   String decoding as 'UTF-8' failed. Consider using a byte string or
>>> unicode string explicitly, or adjust the source code encoding.
>>>
>>> As an example, here's a complete file that Cython 0.12 will refuse  
>>> to compile:
>>> -------------
>>> s = "\x12\x34\x9f\x65"
>>> -------------
>>>
>>> I'm not sure why it's nattering about the source code encoding when
>>> the problem is with explicitly quoted byte values
>>
>> Because you are using a 'str' literal, which needs to be decoded in  
>> Python
>> 3 to become the equivalent str (i.e. unicode) object. A check for  
>> that is
>> required for the semantics of the 'str' type in Cython, as it would
>> otherwise be impossible to switch the type in the generated C code  
>> - you
>> simply can't write out a unicode literal into C in a portable way.
>>
>> The relevant CEP is here:
>>
>> http://wiki.cython.org/enhancements/stringliterals
>
> Sure, I know. But I'm not using Python 3 (I'm using 2.5.2, as
> mentioned), and that page says "Unmarked string literals, when used in
> a Python context, would be [...] byte strings in Py2", and the table
> labeled "Proposal" seems to imply that in Py2, cython will treat "foo"
> and b"foo" as equivalent (just as CPython would). Similarly, under
> "Cons" it notes that the changes under discussion may cause backwards
> compatibility problems when moving from Py2 to Py3, but it does not
> note that they also cause (IMHO rather more serious) backwards
> incompatibility between Cython 0.11+Py2 and Cython 0.12+Py2.
>
>>> but... my question
>>> is, I can fix this by adding a "b" sigil on the front, but that's
>>> incompatible with earlier versions of Cython.
>>
>> Yes, bytes literals were fixed up fairly recently - may have been  
>> 0.11 or
>> so. Given that they were partly broken before that, I don't really  
>> see why
>> you would want to support earlier versions of Cython anyway.
>
> Oh, does that work in 0.11? All the documentation I had found (e.g. at
> the top of that page you linked) only mentions py3-style string
> handling in the context of 0.12. That solves my personal problem.

If you really want byte strings, than I agree that prefixing with b is  
the best option.

However, I agree with your assessment of backwards incompatibility.  
Consider

     len("\xc3\xbf")

In both Python 2 and Python 3 this gives 2, but in Cython it gives 2  
when compiled against 2.x and 1 when compiled against 3.x. That seems  
inconsistent. Given that "abc\xFF" works in Py2 and Py3, would it make  
more sense that this also work (and have the same behavior) in Cython?  
The underlying representation would differ, but in both cases it would  
be the unambiguous 4-character (bytes or unicode) string that one  
would get typing the same thing at the Python prompt. Thus when one  
writes "abc\xFF" it would be interpreted as the actual value of the  
string, not as an encoded value of the string.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to