Nathaniel Smith, 12.12.2009 10:05: > After upgrading to Cython 0.12 today (Python 2.5.2, x86-64, linux), > some code of mine broke. Specifically, it's code for reading a binary > format, and in the tests I had a string that made Cython fail to > compile with the error: > String decoding as 'UTF-8' failed. Consider using a byte string or > unicode string explicitly, or adjust the source code encoding. > > As an example, here's a complete file that Cython 0.12 will refuse to compile: > ------------- > s = "\x12\x34\x9f\x65" > ------------- > > I'm not sure why it's nattering about the source code encoding when > the problem is with explicitly quoted byte values
Because you are using a 'str' literal, which needs to be decoded in Python 3 to become the equivalent str (i.e. unicode) object. A check for that is required for the semantics of the 'str' type in Cython, as it would otherwise be impossible to switch the type in the generated C code - you simply can't write out a unicode literal into C in a portable way. The relevant CEP is here: http://wiki.cython.org/enhancements/stringliterals > but... my question > is, I can fix this by adding a "b" sigil on the front, but that's > incompatible with earlier versions of Cython. Yes, bytes literals were fixed up fairly recently - may have been 0.11 or so. Given that they were partly broken before that, I don't really see why you would want to support earlier versions of Cython anyway. > Is there any way to > write this string that will work with all versions of Cython? I'd just drop support for earlier Cython versions and go with an explicit b'...' literal. > (And was it really intentional to break Python source compatibility so badly?) What do you mean? And what version of Python are you referring to? Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
