On Sunday 14 April 2013 08:54:31 Ivanko B wrote: > I don't think you would like I would > make an "instant switch" to that. ;-) > ================ > Then UTF16 with surrogate paiers ? Won't it have significant > performance issues like with UTF8 ? Some performance testing (by > testcases written by FreepascalRu guys) has revealed that UTF8 is > times or even orders as slower than UCS2. > I don't know if it is useful if I repeat again because it seems nowbody reads or believes it. Anyway, once again. utf16 surrogate pair code points have no common code units with UCS2 -> there is no problem to have surrogate pairs in a UCS2 string as long the pair is not split in the middle. If one searches for a character constant in a string in utf-16 by character index one knows that it is a single 16 bit code unit if the searched character is in BMP (Basic Multilingual Plane). Searching for a substring with surrogate pairs works without further measures. What is more difficult is to determine the count of glyphs in a string. That is difficult in UCS4 too because there could be decomposed characters in the string.
Martin ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ mseide-msegui-talk mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mseide-msegui-talk

