Robert Bradshaw, 07.09.2010 20:16: > On Tue, Sep 7, 2010 at 3:31 AM, Stefan Behnel wrote: >> Robert Bradshaw, 07.09.2010 10:20: >>>> Could you comment on this please? >>>> >>>> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243 >>>> >>>> I think I made it pretty clear there what I think the two suitable >>>> alternatives are. >>> >>> Yes, you favor either (1) re-interpretation of the literal depending >>> on the type context they're used in or (2) disallowing interpretation >>> of string literals when unicode literal are enabled. >>> >>> I think (1) is a bad path to take and would prefer not to burden users >>> with (2). >> >> So, what about doing the following then: >> >> 1) we keep the current implementation as is, i.e. unprefixed string >> literals can coerce to char* literals during type analysis that match the >> byte sequence in the source file and properly handle byte escapes > > I'd be more OK with that, except for I'd rather have consistent > handling of the \u escape. The -2 behavior is the same, the -3 > behavior as below, so the from __future__ import unicode_literals is > more of an intermediate step, so not quite as important in the long > run.
I think so, too. In the long run, users should be able to appreciate -3 more than the partial imports. There's still some way to go to get it rolling smoothly (see Lisandro's "str" problem), but that'll come over time. >> 2) with the -3 option, we disallow byte values> 127 in byte string >> literals and do not generate a byte string representation for unprefixed >> string literals that contain them, thus effectively preventing their >> coercion to char* >> >> That's basically the ASCII-only proposal with added escapes, and my >> proposal minus non-ASCII literal characters. Should make life easy for >> basically everyone, with the added benefit of increasing the compatibility >> with Python 3. > > +1 Here's an attempt: http://hg.cython.org/cython-devel/rev/8f4cda480124 Hudson complains about one of the tests in Py<=2.5, but I should be able to fix that. >> We may additionally consider warning about '\u...' in unprefixed char* >> strings. I think this particular case will be rare enough to encourage a >> 'b' prefix or a '\\' escape. > > If we do this, we should have a warning for sure. It's generally valid Python to put a plain Unicode escape sequence into a byte string, but a warning will make it clear that it does have a code smell to do that because it makes the literal look like something that it is not. I think that in the context of char* literals, we are free to decide either way (as long as the char* context doesn't occur due to an internal optimisation of Cython...) > I'd love to hear what others think. Sure. Please give it a try. Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
