On Friday 19 November 2010 23:25:03 you wrote:
> > Python is unclear about non-BMP characters: narrow build was called
> > "ucs2" for long time, even if it is UTF-16 (each character is encoded to
> > one or two UTF-16 words).
>
> No, no, no :-)
>
> UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even
> "UTF-16" and "UTF-32".
Ok for Python 2:
$ ./python
Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49)
>>> import sys; sys.maxunicode
65535
>>> x=u'\U0010ffff'; len(x)
2
>>> ord(x)
...
TypeError: ord() expected a character, but string of length 2 found
But Python 3 does use UTF-16 for narrow build:
$ ./python
Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09)
>>> import sys; sys.maxunicode
65535
>>> c=chr(0x10ffff); len(c)
2
>>> ord(c)
1114111
Victor
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com