On Sat, Sep 4, 2010 at 9:24 PM, Stefan Behnel <[email protected]> wrote:
> Robert Bradshaw, 04.09.2010 22:04:
>> How about we parse the literals as unicode strings, and if used in a
>> bytes context we raise a compile time error if any characters are
>> larger than a char?
>
> Can't work because you cannot recover the original byte sequence from a
> decoded Unicode string. It may have used escapes or not, and it may or may
> not be encodable using the source code encoding.

I'm saying we shouldn't care about using escapes, and should raise a
compile time error if it's not encodable using the source encoding. In
other words, I'm not a fan of

    foo("abc \u0001")

behaving (in my opinion) very differently depending on whether foo
takes a char* or object argument. I'd rather have it be decode as a
unicode string when reading the source, then if need be re-encoded as
bytes if possible. Probably the simplest thing to do here is only
allow ASCII in such string literals--this will handle both the common
case and I don't think it's a stretch for the user to have to be
explicit about b"..." vs u"..." for literals with high-value code
points.

- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to