On Sat, 12 Feb 2011, Adam Borowski wrote: > On Fri, Feb 11, 2011 at 08:16:54PM -0200, Henrique de Moraes Holschuh wrote: > > 2. Anything that cannot deal with Supplementary planes. > > > > This includes the use of UCS-2 instead of UTF-16, as it cannot represent > > the Supplementary planes. python 3 when not compiled to use UCS-4 memory > > hog mode is an example, I am told. > > Using UCS-2 is hardly better than using ISO-8859-1 or any other ancient > charset. Using either UTF-16 or UCS-4 can be a memory hog, that's why to > pick UTF-8 for regular use. Except for some rare cases (CJK with no
Python 3 uses UCS-2 (or UCS-4) for the internal representation. Likely they wanted to have something that made it easy to address each character in an Unicode string in O(1). That might actually give better performance given how much people like to do string slicing and splicing in python. The O(N) often required by UTF-8 and UTF-16 might well be more painful than the much larger data cache footprint of UCS-4... but that is a damn big *maybe*, and very unlikely to be consistent across very different architectures. Well, not like I care. I don't even have Python 3 installed, and I will only do so the day something I need decides to pull it as a dependency. > Picking a random subset of Unicode is like putting day-of-the-year in one UCS-2 is deprecated as all heck. As far as I could research through Google, it is not a valid Unicode representation since Unicode 2.0 (i.e. 1996). So it wouldn't even count as a "random subset of Unicode". -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110212035533.ga32...@khazad-dum.debian.net